Job ID: JOB_ID_980
Role Overview
The Site Reliability Engineering (SRE) position in Redmond, WA, offers a unique opportunity to manage and scale AI-centric infrastructures within the Microsoft Azure ecosystem. In 2026, the role of an SRE has evolved to include the management of AI Foundry and Retrieval-Augmented Generation (RAG) solutions, requiring a blend of traditional systems engineering and modern AI operations. This role is open for remote work, providing flexibility while maintaining a focus on high-availability services. The successful candidate will be responsible for ensuring that all AI resources are deployed efficiently and that service uptime meets the highest industry standards.
Key Responsibilities
- Deploy, configure, and manage AI-specific resources on Microsoft Azure, including AI Foundry and RAG architectures.
- Monitor service health and reliability, focusing on critical metrics such as uptime, availability, and latency.
- Integrate SRE metrics with enterprise-level monitoring systems to provide real-time visibility into system performance.
- Support and optimize CI/CD workflows using GitHub CLI, GitHub Repos, and Azure DevOps.
- Automate routine operational tasks to reduce toil and improve system efficiency.
- Collaborate with development teams to improve the reliability of C# based applications through better observability and error handling.
- Perform root cause analysis for production incidents and implement long-term fixes to prevent recurrence.
- Manage cloud resource scaling to meet fluctuating demand while maintaining cost-effectiveness.
- Ensure compliance with security and data protection standards across all cloud environments.
Technical Requirements
Candidates should have significant experience with Microsoft Azure and its suite of AI services. A deep understanding of SRE principles, including error budgets and SLIs/SLOs, is mandatory. Hands-on proficiency with GitHub and Azure DevOps is required for managing modern software lifecycles. Basic knowledge of C# is necessary to support application-level troubleshooting. This role is ideal for engineers who thrive in automated, cloud-native environments and are eager to work with the latest AI infrastructure technologies.
Special Requirements
Open for Remote; C2C contract at $60/hr.
Compensation & Location
Salary: $124,800 – $145,000 per year
Location: Redmond, WA
Recruiter / Company – Contact Information
Recruiter / Employer: Agile Enterprise Solutions Inc.
Email: susan_khan@aesincus.com
Recruiter Notice:
To remove this job posting, please send an email from
susan_khan@aesincus.com with the subject:
DELETE_JOB_ID_980