Job Description:

Responsibilities:

Build data pipelines to enable training of LLMs
Work with technical and non-technical stakeholders to collect requirements, integrate with data systems and deliver large-scale datasets
Operate in a complex multi-cloud data lakehouse, building end-to-end integrated workflows
Work with privacy, security, and other policy stakeholders to design, review and implement solutions to automate compliant ETL for downstream teams

Requirements:

Availability for long-term business travel, spending most of the year in the U.S. – working in customer’s office
Strong understanding of machine learning principles, especially in the context of LLMs.
5+ years of experience in big data engineering
Proficiency in Java, Scala, Spark
Proficiency in PySpark and Golang(preferred)
Experience building and leveraging largescale data infrastructure, map-reduce/hadoop, cloud-native deployments
Experience with alerting, monitoring and remediation automation in a large scale distributed environment
BS/BA or equivalent degree in computer science or similar (preferred).

We offer:

Tagged as: data engineer