Job ID: JOB_ID_703
Position Summary
We are looking for a dedicated Site Reliability Engineer (SRE) with a specialized focus on Order Management Systems (OMS), specifically IBM Sterling OMS, to join our team in Marlborough, MA. This onsite position is pivotal in ensuring the reliability, scalability, and performance of our mission-critical order processing platforms. You will bridge the gap between software engineering and systems operations, applying a software engineering mindset to system administration challenges.
Core Responsibilities
- Maintain and monitor the health of Sterling OMS platforms to ensure compliance with strict Service Level Agreements (SLAs).
- Develop sophisticated automation tools and scripts to eliminate operational toil and manual intervention in system management.
- Streamline deployment processes and incident response workflows through CI/CD integration and automated recovery scripts.
- Lead the Incident Management process, including conducting thorough Root Cause Analysis (RCA) and post-mortem reviews for production issues.
- Implement proactive monitoring, logging, and distributed tracing solutions to gain deep observability into system health and user journeys.
- Conduct rigorous capacity planning and performance tuning to prepare the system for peak seasonal order volumes and high-traffic events.
- Collaborate closely with development teams to ensure that new features are designed with reliability, scalability, and maintainability in mind.
- Manage and coordinate with offshore teams to ensure 24/7 operational coverage and seamless handovers.
- Optimize cloud infrastructure resources on platforms like AWS, Azure, or GCP to balance performance and cost-efficiency.
- Manage containerized environments using Kubernetes and Docker to ensure consistent application delivery across environments.
Required Qualifications
- Extensive technical expertise in Order Management Systems, with a primary focus on IBM Sterling OMS.
- Strong programming and scripting skills in Java, Python, or Shell scripting for automation and tool development.
- In-depth knowledge of Linux/Unix system administration and internals.
- Hands-on experience with cloud platforms (AWS, Azure, or GCP) and modern DevOps tools.
- Proficiency in container orchestration technologies, specifically Kubernetes and Docker.
- Experience supporting large-scale, distributed software architectures in a production environment.
Communication and Leadership
The successful candidate will demonstrate superior communication skills, capable of articulating technical challenges and solutions to diverse audiences. You must be comfortable working in a fast-paced environment, managing multiple priorities, and leading technical discussions during critical incidents. Experience in coordinating with global teams and managing offshore resources is highly valued in this role.
Special Requirements
Onsite, Offshore coordination required
Compensation & Location
Salary: $160,000 – $215,000 per year
Location: Marlborough, MA
Recruiter / Company – Contact Information
Recruiter / Employer: KK Software Associates
Email: khadar@kksoftwareassociates.com
Recruiter Notice:
To remove this job posting, please send an email from
khadar@kksoftwareassociates.com with the subject:
DELETE_JOB_ID_703