Our client is seeking an experienced Senior Data Engineer to join our team and play a pivotal role in building and managing large-scale data pipelines, with a focus on supporting the development of Large Language Models (LLMs) and agent-based applications. In addition to your technical expertise, you will manage and mentor junior data engineers, helping them grow while ensuring high standards of data engineering practices across the team.
This is a hybrid 12-month contract with 2 days in the downtown Toronto office.
Responsibilities:
Design and implement scalable and reliable data pipelines to handle increasing data complexity and volume for LLM and agent applications.
Develop and optimize data infrastructures to meet the needs of predictive modeling, machine learning, and generative AI applications.
Work closely with data scientists, machine learning engineers, and business stakeholders to understand data requirements and deliver high quality data solutions.
Extract, transform, and load (ETL) large datasets from a variety of structured and unstructured data sources using APIs and other technologies.
Create and maintain clear, concise technical documentation for data engineering workflows, pipelines, and processes.
Foster a collaborative environment by mentoring junior team members in best practices, new technologies, and approaches in data engineering.
Oversee the work of junior data engineers, providing mentorship and guidance to drive the successful execution of projects.
Qualifications:
Bachelor's degree in computer science, Engineering, or a related field. Advanced degree is a plus.
5+ years of experience in data engineering or related roles, with at least 2 years of experience working with LLMs, agent-based applications, or similar advanced machine learning technologies.
Advanced skills in SQL, Python, and ETL frameworks for building data pipelines.
Strong experience in working with APIs to extract, transform, and load data from multiple sources, including structured and unstructured data formats (e.g., JSON, XML).
Familiarity with machine learning models, including large language models (LLMs) and generative AI techniques, and an understanding of how to build and optimize data pipelines to support these applications.
In-depth knowledge of data modeling and storage solutions for both structured and unstructured data, as well as cloud data technologies like Google BigQuery and Azure Data Lake.
Strong leadership abilities to mentor and lead junior engineers. Excellent communication and collaboration skills to work cross functionally with teams.
Proven ability to address complex data challenges, with a strong focus on data optimization, performance, and quality assurance.