Job ID: JOB_ID_5783
Role Overview:
We are seeking an experienced Program Manager with a strong background in Site Reliability Engineering (SRE) and observability to lead large, multi-workstream technology initiatives. The ideal candidate will have a proven track record of delivering enterprise programs with a focus on reliability outcomes and possess excellent executive-level communication and governance skills.
Key Responsibilities:
- Lead and manage enterprise SRE programs, focusing on Service Level Objectives (SLOs), Service Level Indicators (SLIs), incident management, and overall service reliability.
- Drive the adoption of SRE best practices across the organization, including incident maturity, service reliability, and platform hardening.
- Develop and maintain executive-grade governance artifacts and present program status and outcomes to VP/SVP-level leadership.
- Oversee the delivery of technology initiatives, ensuring alignment with business objectives and SRE principles.
- Collaborate with cross-functional teams to define and implement SLOs/SLIs, ensuring they are measurable and actionable.
- Improve incident management processes, reducing Mean Time To Detect (MTTD) and Mean Time To Resolve (MTTR).
- Implement strategies for alert tuning and resilience engineering across various platforms and applications.
- Ensure the development and maintenance of runbooks and operational maturity documentation.
- Manage project timelines, budgets, and resources effectively to ensure successful program delivery.
- Foster a culture of reliability and continuous improvement within the engineering teams.
Required Qualifications:
- 7+ years of SRE/Observability experience in infrastructure delivery.
- 7+ years of overall experience in application and/or infrastructure delivery.
- 3+ years of experience as a Senior Program Manager leading large, multi-workstream technology initiatives.
- Demonstrated experience delivering enterprise programs with SRE or reliability outcomes (e.g., incident maturity, service reliability, platform hardening).
- Proven ability to present to VP/SVP-level leadership and maintain executive-grade governance artifacts.
- Familiarity with Runbook and operational maturity.
- Experience with SLO/SLI definition and adoption.
- Experience with incident management and response maturity.
- Experience with alert tuning and resilience engineering across platforms and applications.
Location:
REMOTE (Based out of Chicago, IL)
Employment Type:
Contract
Special Requirements
Interview Type: Video
Compensation & Location
Salary: $120 – $160 per year (Estimated)
Location: Chicago, IL
Recruiter / Company – Contact Information
Email: lesh@neodymtechnologies.com
Recruiter Notice:
To remove this job posting, please send an email from
lesh@neodymtechnologies.com with the subject:
DELETE_JOB_ID_5783