NEWPosted 6 hours ago

Job ID: JOB_ID_2297

Role Overview

We are seeking a highly skilled Data Engineer to join our Human Interactive Driving division, specifically within the Extreme Performance Intelligent Control department. This role is pivotal in developing scalable, human-like driving intelligence by leveraging data from expert human drivers. You will be responsible for the infrastructure that powers our world models, perception systems, and reinforcement learning algorithms. This is a hybrid role based in Los Altos, CA, requiring two days onsite (Tuesday and Wednesday) with a unique 4×10 shift schedule (off every Friday).

Key Responsibilities

  • Design, implement, and maintain robust data pipelines for ingesting, cleaning, and transforming large-scale autonomous vehicle datasets, including camera, LiDAR, radar, GPS, and simulation logs.
  • Develop and manage scalable storage and retrieval systems utilizing AWS services such as S3, EC2, SageMaker, and Athena.
  • Ensure high data quality and consistency through automated validation, deduplication, and strict schema enforcement.
  • Collaborate with ML research scientists and engineers to provide efficient access to training data, labels, and metadata.
  • Optimize data preprocessing and batching pipelines to support large-scale training and evaluation workflows.
  • Build and maintain tools for dataset versioning, experiment tracking, and feature reproducibility.
  • Implement CI/CD workflows for data and pipeline updates to ensure minimal downtime and reproducible outputs.
  • Monitor pipeline performance and proactively address bottlenecks or outages.

Technical Requirements

  • B.S. or M.S. in Computer Science, Data Engineering, or a related field.
  • Minimum of 3 years of experience building production-grade data infrastructure or ML data pipelines.
  • Strong proficiency in Python and SQL.
  • Experience with data workflow orchestration tools like Airflow, Prefect, or Luigi.
  • Deep expertise in AWS services, particularly S3, EC2, and SageMaker.
  • Familiarity with distributed computing frameworks such as Spark, Dask, or Ray.
  • Understanding of best practices for dataset documentation and standardization in a research environment.

Preferred Qualifications

  • Experience handling autonomous vehicle datasets or robotics sensor data.
  • Familiarity with ML training pipelines and model evaluation workflows.
  • Prior experience collaborating with researchers or applied ML teams in high-throughput environments.
  • AWS certifications (Solutions Architect, Data Analytics, or Machine Learning).

Work Environment and Culture

Our team is dedicated to amplifying the human experience through transformative mobility solutions. We value ownership, innovation, and diversity. As a Data Engineer, you will be embedded in research-facing projects, enabling breakthroughs in AI and robotics. The role offers a chance to shape cutting-edge systems that have a real-world impact on ADAS and autonomous systems development.


Special Requirements

USC / GC only; Local candidate with valid local DL; Hybrid 2 days onsite (Tue/Wed); 4×10 shift (off Fridays); Tech screens conducted in San Ramon, CA.


Compensation & Location

Salary: $145,000 – $195,000 per year (Estimated)

Location: Los Altos, CA


Recruiter / Company – Contact Information

Recruiter / Employer: SHEF SOLUTIONS INC.

Email: tarun@shefsolutionsinc.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
tarun@shefsolutionsinc.com with the subject:

DELETE_JOB_ID_2297

to delete@join-this.com.