Job ID: JOB_ID_9036
About the Role:
As part of the Mail Analytics Data Engineering team, you will be instrumental in building and maintaining large-scale batch pipelines, data serving systems, data lakehouses, and analytics systems. Your work will enable mission-critical decision-making, power downstream applications, and support AI-driven capabilities.
If you are passionate about constructing robust data infrastructure and platforms that drive modern Data- and AI-driven businesses at scale, we encourage you to apply!
Your Day-to-Day Responsibilities:
- Partner with Data Science, Product, and Engineering teams to gather requirements and define the data ontology for Mail Data & Analytics.
- Lead and mentor junior Data Engineers, supporting Yahoo Mail’s evolving data needs.
- Design, build, and maintain efficient and reliable batch data pipelines to populate core data sets.
- Develop scalable frameworks and tooling to automate analytics workflows and streamline user interactions with data products.
- Establish and promote standard methodologies for data operations and lifecycle management.
- Develop new or improve and maintain existing large-scale data infrastructures and systems for data processing or serving, optimizing complex code through advanced algorithmic concepts and a deep understanding of underlying data system stacks.
- Create and contribute to frameworks that enhance the efficacy of data platform and system management and deployment, while working with data infrastructure to triage and resolve issues.
- Prototype new metrics or data systems.
- Define and manage Service Level Agreements (SLAs) for all data sets within your areas of ownership.
- Develop complex queries, very large volume data pipelines, and analytics applications to solve challenging data engineering problems.
- Collaborate with engineers, data scientists, and product managers to understand business problems and technical requirements, delivering effective data solutions.
- Provide engineering consulting on large and complex data lakehouse data initiatives.
You Must Have:
- BS in Computer Science/Engineering, a relevant technical field, or equivalent practical experience, with a specialization in Data Engineering.
- 8+ years of experience building scalable ETL pipelines using industry-standard ETL orchestration tools (e.g., Airflow, Composer, Oozie) with deep expertise in SQL, PySpark, or Scala.
- 3+ years of experience leading data engineering development directly with business or data science partners.
- Proven experience building, scaling, and maintaining Multi-Terabyte data sets, with an expansive toolbox for debugging and unblocking large-scale analytics challenges (e.g., skew mitigation, sampling strategies, accumulation patterns, data sketches).
- Experience with at least one major cloud provider’s suite of offerings (AWS, GCP, Azure).
- Experience developing or enhancing ETL orchestration tools or frameworks.
- Experience working within a standard GitOps workflow (branch and merge, Pull Requests, CI/CD systems).
- Experience working with GDPR regulations.
- Self-driven, challenge-loving, detail-oriented, with a strong teamwork spirit, excellent communication skills, and the ability to multitask and manage expectations.
Preferred Qualifications:
- MS/PhD in Computer Science/Engineering or a relevant technical field, with a specialization in Data Engineering.
- 3 years of experience with Google Cloud Platform (GCP) technologies, including BigQuery, Dataproc, Dataflow, Composer, and Looker.
Special Requirements
Onsite, GDPR experience required, Google Cloud Platform experience preferred.
Compensation & Location
Salary: $70 – $90 per year (Estimated)
Location: Dallas, TX
Recruiter / Company – Contact Information
Recruiter / Employer: Yahoo
Email: esh@cloudthinktech.com
Recruiter Notice:
To remove this job posting, please send an email from
esh@cloudthinktech.com with the subject:
DELETE_JOB_ID_9036