NEWPosted 2 hours ago

Job ID: JOB_ID_5743

Job Overview:

We are seeking a highly skilled and experienced Senior MLOps Technical Lead to join our team. The primary focus of this role is to build an intelligent, data-driven platform that supports the development of next-generation test analytics and test agents. This platform will enable faster insights, improved diagnostics, and a scalable infrastructure for Generative AI systems, connecting test stations, line-level data, and pipelines. You will be responsible for building automated evaluation tools and conducting rigorous statistical analyses to ensure the reliability of both human and AI-based assessment systems.

Key Responsibilities:

  • Benchmark, adapt, and integrate AI/ML models into existing software systems.
  • Independently run and analyze ML experiments to drive real improvements.
  • Build intelligent, data-driven platforms for Generative AI systems.
  • Develop automated evaluation tools and conduct statistical analyses.
  • Ensure the reliability of human and AI-based assessment systems.

Must-Have Requirements:

  • Backend/Systems Experience: 3+ years building production backend or distributed systems (pre-AI experience required).
  • Production AI Systems: Shipped AI/LLM features serving real users at scale (not just prototypes or demos).
  • Agentic Systems: Built AI agents, skills, tools, or MCP (Model Context Protocol) integrations.
  • Python Proficiency: Essential for backend development.
  • Secondary Language: Working knowledge of Go, TypeScript, or Rust.
  • Cloud Infrastructure: Deep experience with AWS/GCP/Azure, including cost optimization and compute decisions, not just deployment.
  • Container & Orchestration: Hands-on experience with Docker and Kubernetes; ability to build, deploy, debug, and scale services independently.
  • LLM Integration: Understanding of token economics, context limits, rate limiting, structured outputs, and API failure modes.
  • LLM Evaluation: Knowledge of how to evaluate LLM outputs and their inherent challenges (non-determinism, quality measurement, regression detection).
  • Hands-On Engineer: Must be able to write code, debug production issues, and deploy their own work, not just act as an architect.

Preferred / Differentiators:

  • Experience building multi-step agentic workflows with tool use and function calling.
  • Experience with agent orchestration frameworks (LangGraph, CrewAI, or custom).
  • Experience building guardrails, fallbacks, or graceful degradation for AI systems.
  • Familiarity with streaming inference and async agent orchestration.
  • Expertise in cost/latency optimization techniques such as caching, batching, and prompt compression.
  • Experience with ML observability tools like Langfuse, Arize, Braintrust, or W&B.
  • Experience with retrieval systems (vector search, hybrid search) as a tool, not the primary focus.

Experience:

Minimum 13+ years of experience is mandatory.

Location:

Cupertino, CA / Austin, TX (Onsite Mandatory)

Employment Type:

Onsite W2


Special Requirements

USC / GC Holder only; Onsite Mandatory


Compensation & Location

Salary: $150,000 – $200,000 per year (Estimated)

Location: Austin, TX


Recruiter / Company – Contact Information

Email: it.verma@tekfortune.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
it.verma@tekfortune.com with the subject:

DELETE_JOB_ID_5743

to delete@join-this.com.