Job ID: JOB_ID_980

Role Overview

The Site Reliability Engineering (SRE) position in Redmond, WA, offers a unique opportunity to manage and scale AI-centric infrastructures within the Microsoft Azure ecosystem. In 2026, the role of an SRE has evolved to include the management of AI Foundry and Retrieval-Augmented Generation (RAG) solutions, requiring a blend of traditional systems engineering and modern AI operations. This role is open for remote work, providing flexibility while maintaining a focus on high-availability services. The successful candidate will be responsible for ensuring that all AI resources are deployed efficiently and that service uptime meets the highest industry standards.

Key Responsibilities

  • Deploy, configure, and manage AI-specific resources on Microsoft Azure, including AI Foundry and RAG architectures.
  • Monitor service health and reliability, focusing on critical metrics such as uptime, availability, and latency.
  • Integrate SRE metrics with enterprise-level monitoring systems to provide real-time visibility into system performance.
  • Support and optimize CI/CD workflows using GitHub CLI, GitHub Repos, and Azure DevOps.
  • Automate routine operational tasks to reduce toil and improve system efficiency.
  • Collaborate with development teams to improve the reliability of C# based applications through better observability and error handling.
  • Perform root cause analysis for production incidents and implement long-term fixes to prevent recurrence.
  • Manage cloud resource scaling to meet fluctuating demand while maintaining cost-effectiveness.
  • Ensure compliance with security and data protection standards across all cloud environments.

Technical Requirements

Candidates should have significant experience with Microsoft Azure and its suite of AI services. A deep understanding of SRE principles, including error budgets and SLIs/SLOs, is mandatory. Hands-on proficiency with GitHub and Azure DevOps is required for managing modern software lifecycles. Basic knowledge of C# is necessary to support application-level troubleshooting. This role is ideal for engineers who thrive in automated, cloud-native environments and are eager to work with the latest AI infrastructure technologies.


Special Requirements

Open for Remote; C2C contract at $60/hr.


Compensation & Location

Salary: $124,800 – $145,000 per year

Location: Redmond, WA


Recruiter / Company – Contact Information

Recruiter / Employer: Agile Enterprise Solutions Inc.

Email: susan_khan@aesincus.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
susan_khan@aesincus.com with the subject:

DELETE_JOB_ID_980

to delete@join-this.com.