Job ID: JOB_ID_2514
Position Overview: AI Operations Platform Consultant
The AI Operations Platform Consultant is a specialized technical role designed for professionals who excel in the operational management of high-performance Artificial Intelligence infrastructure. As organizations increasingly move from experimental AI models to production-grade Large Language Model (LLM) applications, the need for robust, scalable, and resilient platforms has never been greater. This role is specifically focused on the ‘Ops’ side of the AI lifecycle—MLOps and LLMOps—ensuring that the underlying containerized services, hardware accelerators (GPUs), and inference servers are tuned for maximum efficiency and reliability.
Unlike traditional AI engineering roles that may focus on model architecture or data science research, this position is a hands-on technical trouble-shooter. You will be responsible for the health of production solutions, identifying and resolving bottlenecks in real-time, and ensuring that mission-critical applications running on Kubernetes (specifically Red Hat OpenShift) maintain high availability. This is a high-impact role that sits at the intersection of platform engineering, cloud-native infrastructure, and cutting-edge AI deployment.
Key Responsibilities and Duties
- MLOps & LLMOps Pipeline Management: Oversee the end-to-end lifecycle of AI models, from deployment to retirement. You will manage, operate, and support complex pipelines that automate the rollout of LLMs into production environments.
- Production Troubleshooting: Act as the primary technical expert for diagnosing and resolving issues within containerized AI services. This includes deep-dive analysis of LLM model failures, latency spikes, and resource contention on Kubernetes clusters.
- Model Optimization: Implement advanced optimization techniques to improve inference performance. This includes pruning, quantization, and knowledge distillation, as well as the use of mixed precision and sharding to maximize the utility of GPU resources.
- Inference Server Configuration: Expertly configure and tune NVIDIA Triton Inference Servers and TensorRT-LLM engines. You will be responsible for setting up dynamic batching, model versioning, and concurrent execution to handle high-throughput workloads.
- Infrastructure Monitoring: Develop and maintain comprehensive observability for AI inference systems. You will use telemetry and custom dashboards to track GPU health, memory usage, throughput, and service availability, ensuring proactive identification of potential failures.
- Standard Operational Processes: Adhere to and improve standard ITIL processes, including incident management, change management, and event management. You will ensure that all platform changes are documented and that incidents are resolved within defined SLAs.
- Scalable Architecture Support: Manage scalable infrastructure for deploying LLMs, utilizing GPU-aware scheduling and load balancing to ensure that the platform can grow alongside the organization’s AI needs.
Technical Requirements and Qualifications
- Kubernetes Expertise: Extensive experience deploying and managing containerized services at scale on Kubernetes, with a strong preference for Red Hat OpenShift environments.
- AI Tooling: Proven proficiency with NVIDIA’s AI software stack, specifically Triton Inference Server and TensorRT-LLM. You should have a deep understanding of their architecture and configuration.
- GPU Acceleration: Hands-on experience operating large-scale GPU-accelerated platforms. You must understand how to optimize workloads for multi-node clusters and manage GPU-specific resources.
- Deployment Workflows: Experience building and optimizing production-grade LLM pipelines, including containerization, microservices design, and API development.
- Problem Solving: A strong background in technical troubleshooting for mission-critical systems. You should be comfortable working in high-pressure environments where rapid resolution is required.
- Local Requirement: Candidates must be local to either Charlotte, NC or Jersey City, NJ and possess a valid Driver’s License (DL) for potential on-site requirements.
Interview and Selection Process
The interview process for this role is designed to be thorough and technical. It typically consists of 2 to 3 rounds: 1. Technical Screening: An initial assessment of your hands-on skills with Kubernetes, Triton, and LLM optimization. 2. Internal Interview: A deeper dive with the engineering and operations leadership team to discuss your experience with mission-critical systems. 3. Customer/Final Interview: A final round, potentially involving the end customer in Charlotte, NC, to ensure a strong fit for the specific project environment.
Special Requirements
Need Local with DL. Interview: 2-3 rounds (technical screening, internal, customer).
Compensation & Location
Salary: $180,000 – $260,000 per year
Location: Charlotte, NC
Recruiter / Company – Contact Information
Recruiter / Employer: 3B Staffing LLC
Email: tulsi@3bstaffing.com
Recruiter Notice:
To remove this job posting, please send an email from
tulsi@3bstaffing.com with the subject:
DELETE_JOB_ID_2514