Responsibilities:
- Design technical solutions for data acquisition and storage into our centralized data repository.
- Develop ELT scripts, design data-driven logic and conduct unit testing.
- Conduct database modeling and design as to improve overall performance.
- Produce design artifacts and documentation which will allow future support of the implemented solutions.
- Investigate and resolve incidents and identify whether the problem is caused by the data loading code or is due to bad data received from the data provider.
- Execute service requests related to routine and ad-hoc data loads.
- Provide the data quality check and report on the data quality issue.
Requirements
MUST HAVES:
10+ years experience in:
- Designing and developing scalable Medallion Data Lakehouse architectures.
- Expertise in data ingestion, transformation, and curation using Delta Lake and Databricks.
- Experience integrating structured and unstructured data sources into star/snowflake schemas.
- Building, automating, and optimizing complex ETL/ELT pipelines using Azure Data Factory (ADF), Databricks (PySpark, SQL, Delta Live Tables), and dbt.
- Implementing orchestrated workflows and job scheduling in Azure environments.
- Strong knowledge of relational (SQL Server, Synapse, PostgreSQL) and dimensional modeling.
- Advanced SQL query optimization, indexing, partitioning, and data replication strategies.
- Experience with Apache Spark, Delta Lake, and distributed computing frameworks in Azure Databricks.
- Working with Parquet, ORC, and JSON formats for optimized storage and retrieval.
- Deep expertise in Azure Data Lake Storage (ADLS), Azure Synapse Analytics, Azure SQL, Event Hubs, and Azure Functions.
- Strong understanding of cloud security, RBAC, and data governance.
- Proficiency in Python (PySpark), SQL, and PowerShell for data engineering workflows.
- Experience with CI/CD automation (Azure DevOps, GitHub Actions) for data pipelines.
- Implementing data lineage, cataloging, metadata management, and data quality frameworks.
- Experience with Unity Catalog for managing permissions in Databricks environments.
- Expertise in Power BI (DAX, data modeling, performance tuning).
- Experience in integrating Power BI with Azure Synapse and Databricks SQL Warehouses.
- Familiarity with MLflow, AutoML, and embedding AI-driven insights into data pipelines.