NEWPosted 5 hours ago

Job ID: JOB_ID_9101

Role Description:

The Site Reliability Engineering (SRE) Team is an integrated part of the DevOps team. The DevOps team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of Sonexus applications. SREs are expected to use a software development mindset to solve operational and platform challenges through automation. We are looking for an SRE lead to fill in while the senior SRE is in active duty. We are looking for a hands-on engineer with experience running platforms leveraging industry observability and automation platforms. Someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction. As a Site Reliability Engineer at Sonexus, you’ll be part of efforts to ensure uptime, scalability, and incident resilience across Sonexus platforms. You’ll develop observability systems, engineer intelligent automation, and champion collaboration across teams. This role supports a shift toward predictive, AI-driven operations while maintaining industry-standard service levels.

Skills:

  • Collaborate across engineering, dev, and infrastructure teams to solve complex reliability challenges using automation and observability.
  • Maintain and evolve cloud-native Sonexus platforms (Azure preferred, GCP optional), optimizing performance and uptime.
  • Lead and participate in high-impact incident resolution (P1/P2) using playbooks, logs, and anomaly detection tools (Splunk, Dynatrace).
  • Develop and tune alerting strategies based on SLIs/SLOs, error budgets, and historical incident data to minimize noise.
  • Build and enhance dashboards for latency, error rates, throughput, and system health.
  • Automate runbooks and response workflows using Python or infrastructure-as-code tools (Terraform preferred).
  • Conduct root cause analysis with action tracking via Azure DevOps, enabling blameless postmortems.
  • Drive integration of observability best practices in application code, supporting feature telemetry and end-to-end traceability.
  • Set and refine SLIs/SLOs based on evolving system behavior and user experience.
  • Contribute to platform risk management and compliance activities.
  • Provide support for SRE-aligned change and release processes across CI/CD pipelines.
  • Assist in capacity planning, estimation, and system reliability strategy.
  • Lead P1/P2 incident management, ensuring rapid diagnosis and resolution using defined run books/playbooks.
  • Analyze logs, metrics, and signals using Splunk, Dynatrace, and other observability tools to detect anomalies.
  • Own production reliability, availability, and performance across Azure and GCP platforms.
  • Define and continuously improve incident response processes, escalation mechanisms, and post-incident RCA.
  • Collaborate with application, infrastructure, and cloud teams to prevent recurring incidents.
  • Drive automation, monitoring improvements, and reliability engineering best practices.
  • Provide technical leadership and mentorship to SRE operations teams.

Essential Skills:

We are seeking an experienced Site Reliability Engineering (SRE) Lead to manage production reliability, incident response, and cloud operations across Azure and GCP environments. The role will lead incident resolution, define operational playbooks, and drive system stability through monitoring and automation.

Education:

At least a bachelor’s degree (or equivalent experience) in Computer Science, Software/Electronics Engineering, Information Systems, or a closely related field is required.


Special Requirements

VISA: H1B with PP Number, Interview Mode: Not Specified, Domain: Not Specified


Compensation & Location

Salary: $104,000 – $104,000 per year

Location: Dublin, OH


Recruiter / Company – Contact Information

Recruiter / Employer: Sonexus

Email: manasa.s@itechus.net


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
manasa.s@itechus.net with the subject:

DELETE_JOB_ID_9101

to delete@join-this.com.