NEWPosted 4 hours ago

Job ID: JOB_ID_6357

About the Role:

We are seeking an experienced Incident Manager to join our team. This role is responsible for managing the incident management bridge calls, coordinating repair efforts for major incidents, and ensuring regular communication updates to stakeholders. The ideal candidate will have a strong understanding of incident management processes, experience with various monitoring tools, and the ability to handle high-pressure situations with innovation and presence of mind.

Key Responsibilities:

  • Manage incident management bridge calls with support teams, on-call support application teams, and management.
  • Coordinate repair efforts for all major incidents (P1-P4), including escalation, status updates, and assistance.
  • Provide regular communication updates to the Customer, End-Users, and other Stakeholders during the entire Incident Management cycle.
  • Track and document incident updates in real-time.
  • Handle escalated cases with presence of mind and innovation.
  • Support the development and execution of change management plans to drive adoption of new processes, systems, and technologies.
  • Review changes, their priority, urgency, and perform risk analysis.
  • Create problem tickets and respective action items, review root cause analysis and its closers.
  • Perform PIR and Postmortem reports.
  • Lead Site Reliability/Disaster Recovery/Game Day/Switchover/Failover activities.
  • Utilize multiple monitoring tools such as Service Now, Pager duty, Slack, Zoom, JIRA, etc.
  • Perform quality audits and data analytics on incident tickets to ensure quality and uncover new trends.
  • Meet SLAs and other agreed KPIs and produce Process Performance Reports.
  • Provide documentation for the Known Error Data Base (KEDB) or similar repository.
  • Develop processes and procedures to ensure Incident Management related action items are tracked and completed.
  • Ensure process adherence and meet quality norms.
  • Provide Management reporting on Incident Metrics and Incident Management performance.

Qualifications/Skills Required:

  • Degree in Computer Science, Information Technology, or related field.
  • 7-10 years of experience in incident management or a related field.
  • Knowledge of Cloud services is a must (AWS/Azure/GCP).
  • Advanced proficiency in site reliability culture and principles, with the ability to implement site reliability across platform teams while avoiding common pitfalls.
  • Experience in planning and conducting site reliability testing.
  • Experience in AMS – Application Management Services.
  • Knowledge of incident management, change management, and problem management processes and procedures.
  • Experience with and knowledge of change management principles, methodologies, and tools.
  • Excellent problem-solving and analytical skills.
  • Excellent verbal & written communication and interpersonal skills.
  • Ability to work independently and as part of a team.
  • Ability to manage multiple tasks simultaneously.

Special Requirements

Visa constraints: Not specified. Screening steps: Not specified. Interview modes: Not specified. Domain restrictions: Not specified.


Compensation & Location

Salary: $110,000 – $150,000 per year

Location: Oakland, CA


Recruiter / Company – Contact Information

Recruiter / Employer: Nityo Infotech Corp.

Email: thosh.n@nityo.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
thosh.n@nityo.com with the subject:

DELETE_JOB_ID_6357

to delete@join-this.com.