We are seeking a highly skilled and experienced Senior Data Engineer to join our dynamic team. In this role, you will be responsible for designing, building, and maintaining scalable and reliable data pipelines and infrastructure on AWS. You will play a critical role in enabling our data-driven decision-making processes by ensuring the availability and quality of our data. The ideal candidate will possess a strong background in AWS cloud services, Python, SQL, PySpark, Airflow, and infrastructure as code (CDK). Experience with DevOps practices is a significant plus.
Responsibilities:
Data Pipeline Development:** Design, develop, and maintain robust and scalable data pipelines using Python, PySpark, and Airflow to ingest, process, and transform large datasets.
Cloud Infrastructure (AWS):** Architect, build, and manage data infrastructure on AWS using services like S3, EC2, EMR, Redshift, Glue, and Lambda.
Infrastructure as Code (CDK):** Implement and manage infrastructure as code using AWS CDK to ensure consistency, repeatability, and scalability of our data platform.
Database Management:** Design and optimize database schemas and queries using SQL for efficient data storage and retrieval.
Data Quality and Testing:** Implement comprehensive unit testing strategies to ensure data quality and pipeline reliability.
Performance Optimization:** Identify and resolve performance bottlenecks in data pipelines and infrastructure.
Collaboration:** Work closely with data scientists, analysts, and other engineers to understand data requirements and deliver effective solutions.