NEWPosted 2 hours ago

Job ID: JOB_ID_5743

Job Overview:

We are seeking a highly skilled and experienced Senior MLOps Technical Lead to join our team. The primary focus of this role is to build an intelligent, data-driven platform that supports the development of next-generation test analytics and test agents. This platform will enable faster insights, improved diagnostics, and a scalable infrastructure for Generative AI systems, connecting test stations, line-level data, and pipelines. You will be responsible for building automated evaluation tools and conducting rigorous statistical analyses to ensure the reliability of both human and AI-based assessment systems.

Key Responsibilities:

Benchmark, adapt, and integrate AI/ML models into existing software systems.
Independently run and analyze ML experiments to drive real improvements.
Build intelligent, data-driven platforms for Generative AI systems.
Develop automated evaluation tools and conduct statistical analyses.
Ensure the reliability of human and AI-based assessment systems.

Must-Have Requirements:

Backend/Systems Experience: 3+ years building production backend or distributed systems (pre-AI experience required).
Production AI Systems: Shipped AI/LLM features serving real users at scale (not just prototypes or demos).
Agentic Systems: Built AI agents, skills, tools, or MCP (Model Context Protocol) integrations.
Python Proficiency: Essential for backend development.
Secondary Language: Working knowledge of Go, TypeScript, or Rust.
Cloud Infrastructure: Deep experience with AWS/GCP/Azure, including cost optimization and compute decisions, not just deployment.
Container & Orchestration: Hands-on experience with Docker and Kubernetes; ability to build, deploy, debug, and scale services independently.
LLM Integration: Understanding of token economics, context limits, rate limiting, structured outputs, and API failure modes.
LLM Evaluation: Knowledge of how to evaluate LLM outputs and their inherent challenges (non-determinism, quality measurement, regression detection).
Hands-On Engineer: Must be able to write code, debug production issues, and deploy their own work, not just act as an architect.

Preferred / Differentiators:

Experience building multi-step agentic workflows with tool use and function calling.
Experience with agent orchestration frameworks (LangGraph, CrewAI, or custom).
Experience building guardrails, fallbacks, or graceful degradation for AI systems.
Familiarity with streaming inference and async agent orchestration.
Expertise in cost/latency optimization techniques such as caching, batching, and prompt compression.
Experience with ML observability tools like Langfuse, Arize, Braintrust, or W&B.
Experience with retrieval systems (vector search, hybrid search) as a tool, not the primary focus.

Experience:

Minimum 13+ years of experience is mandatory.

Location:

Cupertino, CA / Austin, TX (Onsite Mandatory)

Employment Type:

Onsite W2

Special Requirements

USC / GC Holder only; Onsite Mandatory

Compensation & Location

Salary: $150,000 – $200,000 per year (Estimated)

Location: Austin, TX

Recruiter / Company – Contact Information

Email: it.verma@tekfortune.com

Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
it.verma@tekfortune.com with the subject:

DELETE_JOB_ID_5743

to delete@join-this.com.