NEWPosted 4 hours ago

Job ID: JOB_ID_5783

Role Overview:

We are seeking an experienced Program Manager with a strong background in Site Reliability Engineering (SRE) and observability to lead large, multi-workstream technology initiatives. The ideal candidate will have a proven track record of delivering enterprise programs with a focus on reliability outcomes and possess excellent executive-level communication and governance skills.

Key Responsibilities:

  • Lead and manage enterprise SRE programs, focusing on Service Level Objectives (SLOs), Service Level Indicators (SLIs), incident management, and overall service reliability.
  • Drive the adoption of SRE best practices across the organization, including incident maturity, service reliability, and platform hardening.
  • Develop and maintain executive-grade governance artifacts and present program status and outcomes to VP/SVP-level leadership.
  • Oversee the delivery of technology initiatives, ensuring alignment with business objectives and SRE principles.
  • Collaborate with cross-functional teams to define and implement SLOs/SLIs, ensuring they are measurable and actionable.
  • Improve incident management processes, reducing Mean Time To Detect (MTTD) and Mean Time To Resolve (MTTR).
  • Implement strategies for alert tuning and resilience engineering across various platforms and applications.
  • Ensure the development and maintenance of runbooks and operational maturity documentation.
  • Manage project timelines, budgets, and resources effectively to ensure successful program delivery.
  • Foster a culture of reliability and continuous improvement within the engineering teams.

Required Qualifications:

  • 7+ years of SRE/Observability experience in infrastructure delivery.
  • 7+ years of overall experience in application and/or infrastructure delivery.
  • 3+ years of experience as a Senior Program Manager leading large, multi-workstream technology initiatives.
  • Demonstrated experience delivering enterprise programs with SRE or reliability outcomes (e.g., incident maturity, service reliability, platform hardening).
  • Proven ability to present to VP/SVP-level leadership and maintain executive-grade governance artifacts.
  • Familiarity with Runbook and operational maturity.
  • Experience with SLO/SLI definition and adoption.
  • Experience with incident management and response maturity.
  • Experience with alert tuning and resilience engineering across platforms and applications.

Location:

REMOTE (Based out of Chicago, IL)

Employment Type:

Contract


Special Requirements

Interview Type: Video


Compensation & Location

Salary: $120 – $160 per year (Estimated)

Location: Chicago, IL


Recruiter / Company – Contact Information

Email: lesh@neodymtechnologies.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
lesh@neodymtechnologies.com with the subject:

DELETE_JOB_ID_5783

to delete@join-this.com.