Job ID: JOB_ID_5743
Job Overview:
We are seeking a highly skilled and experienced Senior MLOps Technical Lead to join our team. The primary focus of this role is to build an intelligent, data-driven platform that supports the development of next-generation test analytics and test agents. This platform will enable faster insights, improved diagnostics, and a scalable infrastructure for Generative AI systems, connecting test stations, line-level data, and pipelines. You will be responsible for building automated evaluation tools and conducting rigorous statistical analyses to ensure the reliability of both human and AI-based assessment systems.
Key Responsibilities:
- Benchmark, adapt, and integrate AI/ML models into existing software systems.
- Independently run and analyze ML experiments to drive real improvements.
- Build intelligent, data-driven platforms for Generative AI systems.
- Develop automated evaluation tools and conduct statistical analyses.
- Ensure the reliability of human and AI-based assessment systems.
Must-Have Requirements:
- Backend/Systems Experience: 3+ years building production backend or distributed systems (pre-AI experience required).
- Production AI Systems: Shipped AI/LLM features serving real users at scale (not just prototypes or demos).
- Agentic Systems: Built AI agents, skills, tools, or MCP (Model Context Protocol) integrations.
- Python Proficiency: Essential for backend development.
- Secondary Language: Working knowledge of Go, TypeScript, or Rust.
- Cloud Infrastructure: Deep experience with AWS/GCP/Azure, including cost optimization and compute decisions, not just deployment.
- Container & Orchestration: Hands-on experience with Docker and Kubernetes; ability to build, deploy, debug, and scale services independently.
- LLM Integration: Understanding of token economics, context limits, rate limiting, structured outputs, and API failure modes.
- LLM Evaluation: Knowledge of how to evaluate LLM outputs and their inherent challenges (non-determinism, quality measurement, regression detection).
- Hands-On Engineer: Must be able to write code, debug production issues, and deploy their own work, not just act as an architect.
Preferred / Differentiators:
- Experience building multi-step agentic workflows with tool use and function calling.
- Experience with agent orchestration frameworks (LangGraph, CrewAI, or custom).
- Experience building guardrails, fallbacks, or graceful degradation for AI systems.
- Familiarity with streaming inference and async agent orchestration.
- Expertise in cost/latency optimization techniques such as caching, batching, and prompt compression.
- Experience with ML observability tools like Langfuse, Arize, Braintrust, or W&B.
- Experience with retrieval systems (vector search, hybrid search) as a tool, not the primary focus.
Experience:
Minimum 13+ years of experience is mandatory.
Location:
Cupertino, CA / Austin, TX (Onsite Mandatory)
Employment Type:
Onsite W2
Special Requirements
USC / GC Holder only; Onsite Mandatory
Compensation & Location
Salary: $150,000 – $200,000 per year (Estimated)
Location: Austin, TX
Recruiter / Company – Contact Information
Email: it.verma@tekfortune.com
Recruiter Notice:
To remove this job posting, please send an email from
it.verma@tekfortune.com with the subject:
DELETE_JOB_ID_5743