NEWPosted 5 hours ago

Job ID: JOB_ID_9132

Job Overview

We are seeking a NOC Analyst to provide 24/7 monitoring and incident response for mission-critical healthcare SaaS platforms supporting federal CMS programs. This role is the first line of defense for production systems, responsible for detecting issues, executing runbooks, and escalating when needed to ensure platform stability and customer satisfaction.

The ideal candidate has strong technical troubleshooting skills, experience with monitoring and observability tools, and the ability to work effectively in a follow-the-sun operations model with global teams.

Key Responsibilities

Monitor production systems 24/7 using observability platforms (DataDog & PagerDuty) to detect anomalies, alerts, and incidents
Respond to incidents within SLA, perform initial triage, execute runbooks, and engage on-call engineers when necessary
Log and manage incidents in ITSM platforms (Jira Service Management, ServiceNow) with accurate categorization, priority, and documentation
Execute documented runbooks for common issues (application restarts, health checks, certificate renewals)
Familiarity with remote access methods such as SSH, CLI-based VM management and AWS console, along with support and management of containerized environments (Docker/Kubernetes)
Perform health checks and proactive monitoring to identify degradation before customer impact
Coordinate with engineering, SRE, and service delivery teams during major incidents and change windows
Maintain shift handoff documentation and participate in daily operational standup meetings
Escalate incidents appropriately based on severity, customer impact, and technical complexity
Document known errors, workarounds, and lessons learned to improve operational knowledge base

Required Qualifications

13+ years of experience in NOC, IT operations, technical support, or similar monitoring roles
Experience with monitoring and observability tools (Datadog, Splunk, New Relic, or similar)
Familiarity with incident management and ITSM platforms (Jira Service Management, ServiceNow, PagerDuty)
Moderate understanding of cloud infrastructure (AWS, Azure, GCP), APIs, and web services, i.e. AWS SysOps Associate certification / equivalent.
Strong troubleshooting skills and ability to follow technical runbooks under pressure
Excellent written and verbal communication skills for incident updates and handoffs
Willingness to work rotating shifts including nights, weekends, and holidays (24/7 follow-the-sun model)

Preferred Qualifications

Experience in healthcare, government, or regulated environments (HIPAA, CMS compliance)
ITIL v4 Foundation certification
Familiarity with scripting (PowerShell, Python, Bash) for basic automation tasks
Experience supporting SaaS or cloud-native applications
Understanding of networking concepts (DNS, load balancers, firewalls, SSL/TLS)

Key Metrics for Success

Incident response time (time to acknowledge and begin triage)
Escalation accuracy (appropriate escalations with complete context)
SLA compliance for incident acknowledgment and initial response
Quality of incident documentation and handoff notes
Reduction in repeat alerts through proactive action

Employment Details

Full-time position with rotating shift coverage (24/7 operations)
Follow-the-sun model with US and global team coordination
Must be eligible to work on U.S. federal contracts (background check may be required)

Special Requirements

Must be eligible to work on U.S. federal contracts (background check may be required)

Compensation & Location

Salary: $35 – $40 per year

Location: Center, CO

Recruiter / Company – Contact Information

Email: ilja.porwaal@rulesiq.com

Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
ilja.porwaal@rulesiq.com with the subject:

DELETE_JOB_ID_9132

to delete@join-this.com.