NEWPosted 4 hours ago

Job ID: JOB_ID_9036

About the Role:

As part of the Mail Analytics Data Engineering team, you will be instrumental in building and maintaining large-scale batch pipelines, data serving systems, data lakehouses, and analytics systems. Your work will enable mission-critical decision-making, power downstream applications, and support AI-driven capabilities.

If you are passionate about constructing robust data infrastructure and platforms that drive modern Data- and AI-driven businesses at scale, we encourage you to apply!

Your Day-to-Day Responsibilities:

Partner with Data Science, Product, and Engineering teams to gather requirements and define the data ontology for Mail Data & Analytics.
Lead and mentor junior Data Engineers, supporting Yahoo Mail’s evolving data needs.
Design, build, and maintain efficient and reliable batch data pipelines to populate core data sets.
Develop scalable frameworks and tooling to automate analytics workflows and streamline user interactions with data products.
Establish and promote standard methodologies for data operations and lifecycle management.
Develop new or improve and maintain existing large-scale data infrastructures and systems for data processing or serving, optimizing complex code through advanced algorithmic concepts and a deep understanding of underlying data system stacks.
Create and contribute to frameworks that enhance the efficacy of data platform and system management and deployment, while working with data infrastructure to triage and resolve issues.
Prototype new metrics or data systems.
Define and manage Service Level Agreements (SLAs) for all data sets within your areas of ownership.
Develop complex queries, very large volume data pipelines, and analytics applications to solve challenging data engineering problems.
Collaborate with engineers, data scientists, and product managers to understand business problems and technical requirements, delivering effective data solutions.
Provide engineering consulting on large and complex data lakehouse data initiatives.

You Must Have:

BS in Computer Science/Engineering, a relevant technical field, or equivalent practical experience, with a specialization in Data Engineering.
8+ years of experience building scalable ETL pipelines using industry-standard ETL orchestration tools (e.g., Airflow, Composer, Oozie) with deep expertise in SQL, PySpark, or Scala.
3+ years of experience leading data engineering development directly with business or data science partners.
Proven experience building, scaling, and maintaining Multi-Terabyte data sets, with an expansive toolbox for debugging and unblocking large-scale analytics challenges (e.g., skew mitigation, sampling strategies, accumulation patterns, data sketches).
Experience with at least one major cloud provider’s suite of offerings (AWS, GCP, Azure).
Experience developing or enhancing ETL orchestration tools or frameworks.
Experience working within a standard GitOps workflow (branch and merge, Pull Requests, CI/CD systems).
Experience working with GDPR regulations.
Self-driven, challenge-loving, detail-oriented, with a strong teamwork spirit, excellent communication skills, and the ability to multitask and manage expectations.

Preferred Qualifications:

MS/PhD in Computer Science/Engineering or a relevant technical field, with a specialization in Data Engineering.
3 years of experience with Google Cloud Platform (GCP) technologies, including BigQuery, Dataproc, Dataflow, Composer, and Looker.

Special Requirements

Onsite, GDPR experience required, Google Cloud Platform experience preferred.

Compensation & Location

Salary: $70 – $90 per year (Estimated)

Location: Dallas, TX

Recruiter / Company – Contact Information

Recruiter / Employer: Yahoo

Email: esh@cloudthinktech.com

Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
esh@cloudthinktech.com with the subject:

DELETE_JOB_ID_9036

to delete@join-this.com.