NEWPosted 5 hours ago

Job ID: JOB_ID_9132

Job Overview

We are seeking a NOC Analyst to provide 24/7 monitoring and incident response for mission-critical healthcare SaaS platforms supporting federal CMS programs. This role is the first line of defense for production systems, responsible for detecting issues, executing runbooks, and escalating when needed to ensure platform stability and customer satisfaction.

The ideal candidate has strong technical troubleshooting skills, experience with monitoring and observability tools, and the ability to work effectively in a follow-the-sun operations model with global teams.

Key Responsibilities

  • Monitor production systems 24/7 using observability platforms (DataDog & PagerDuty) to detect anomalies, alerts, and incidents
  • Respond to incidents within SLA, perform initial triage, execute runbooks, and engage on-call engineers when necessary
  • Log and manage incidents in ITSM platforms (Jira Service Management, ServiceNow) with accurate categorization, priority, and documentation
  • Execute documented runbooks for common issues (application restarts, health checks, certificate renewals)
  • Familiarity with remote access methods such as SSH, CLI-based VM management and AWS console, along with support and management of containerized environments (Docker/Kubernetes)
  • Perform health checks and proactive monitoring to identify degradation before customer impact
  • Coordinate with engineering, SRE, and service delivery teams during major incidents and change windows
  • Maintain shift handoff documentation and participate in daily operational standup meetings
  • Escalate incidents appropriately based on severity, customer impact, and technical complexity
  • Document known errors, workarounds, and lessons learned to improve operational knowledge base

Required Qualifications

  • 13+ years of experience in NOC, IT operations, technical support, or similar monitoring roles
  • Experience with monitoring and observability tools (Datadog, Splunk, New Relic, or similar)
  • Familiarity with incident management and ITSM platforms (Jira Service Management, ServiceNow, PagerDuty)
  • Moderate understanding of cloud infrastructure (AWS, Azure, GCP), APIs, and web services, i.e. AWS SysOps Associate certification / equivalent.
  • Strong troubleshooting skills and ability to follow technical runbooks under pressure
  • Excellent written and verbal communication skills for incident updates and handoffs
  • Willingness to work rotating shifts including nights, weekends, and holidays (24/7 follow-the-sun model)

Preferred Qualifications

  • Experience in healthcare, government, or regulated environments (HIPAA, CMS compliance)
  • ITIL v4 Foundation certification
  • Familiarity with scripting (PowerShell, Python, Bash) for basic automation tasks
  • Experience supporting SaaS or cloud-native applications
  • Understanding of networking concepts (DNS, load balancers, firewalls, SSL/TLS)

Key Metrics for Success

  • Incident response time (time to acknowledge and begin triage)
  • Escalation accuracy (appropriate escalations with complete context)
  • SLA compliance for incident acknowledgment and initial response
  • Quality of incident documentation and handoff notes
  • Reduction in repeat alerts through proactive action

Employment Details

  • Full-time position with rotating shift coverage (24/7 operations)
  • Follow-the-sun model with US and global team coordination
  • Must be eligible to work on U.S. federal contracts (background check may be required)

Special Requirements

Must be eligible to work on U.S. federal contracts (background check may be required)


Compensation & Location

Salary: $35 – $40 per year

Location: Center, CO


Recruiter / Company – Contact Information

Email: ilja.porwaal@rulesiq.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
ilja.porwaal@rulesiq.com with the subject:

DELETE_JOB_ID_9132

to delete@join-this.com.