Job ID: JOB_ID_7339
Role Overview:
We are seeking a highly experienced Java Application Support Lead to join our team. This role is crucial for leading L3 support for Java/Spring Boot microservices within a cloud-native (EKS/Kubernetes) environment. You will be responsible for owning incident triage, root cause analysis (RCA), and resolution across complex distributed systems. This position requires a strong understanding of monitoring tools, CI/CD practices, and ITIL processes, along with excellent leadership and communication skills.
Key Responsibilities:
- Lead L3 support for Java/Spring Boot microservices in a cloud-native (EKS/Kubernetes) environment.
- Own incident triage, RCA, and resolution across distributed systems.
- Oversee monitoring using tools like AppDynamics, Grafana, ELK, and CloudWatch, ensuring proactive alerting.
- Drive automation through scripting (Python/Bash) and streamline support workflows.
- Collaborate effectively with DevOps teams on CI/CD pipelines and release readiness.
- Maintain support runbooks, documentation, and ensure ITIL-aligned operations.
- Mentor support engineers and ensure round-the-clock on-call coverage.
- Ensure application stability, performance, and effective stakeholder communication.
- Participate in on-call rotations and handle critical incident responses.
- Experience mentoring support engineers.
Technical Skills Required:
- 10+ years of overall experience.
- Strong hands-on experience in Java and Spring Boot backend development.
- Experience with Amazon EKS, Kubernetes, and containerized environments.
- Proven experience supporting cloud-native microservices in production.
- Familiarity with monitoring, logging, and alerting tools (AppDynamics, Grafana, CloudWatch, ELK, Splunk).
- Exposure to monitoring tools like AppDynamics.
- Knowledge of CI/CD pipelines and DevOps practices.
- Understanding of content management systems and integration pipelines.
- Incident management and ITIL practices.
- Troubleshooting and analytical skills in complex distributed systems.
- Scripting skills in Python or Bash.
- Ownership of incident triage, RCA, and resolution.
- Strong communication and cross-team coordination skills.
- Experience with maintaining support documentation and runbooks.
Managerial Skills:
- Team Leadership: Guide and mentor /L3 support engineers; build a high-performing and collaborative team.
- Incident & Crisis Management: Lead critical incident response with composure; ensure timely resolution and stakeholder communication.
- Stakeholder Management: Coordinate effectively with business, development, QA, and DevOps teams to align priorities and expectations.
- Resource Planning: Manage team shifts, on-call rotations, and workload distribution to ensure 24×7 coverage.
- Process Governance: Enforce ITIL practices (incident, problem, and change management) and drive compliance.
- Decision-Making: Make quick, informed decisions under pressure, especially during outages or high-severity issues.
- Communication Skills: Provide clear, concise, and timely updates to technical and non-technical stakeholders.
- Continuous Improvement: Identify support process gaps and drive automation, tooling, and documentation enhancements.
- Performance Management: Track KPIs (MTTR, SLA adherence, incident volume) and drive team accountability.
- Risk & Impact Assessment: Evaluate change and deployment risks to production systems and recommend mitigation plans.
Additional Information:
This role requires participation in on-call rotations to handle critical incident responses. Strong analytical and problem-solving skills are essential for troubleshooting complex distributed systems.
Special Requirements
On-call rotations required. Incident management and ITIL practices. Cloud-native (EKS/Kubernetes) environment support.
Compensation & Location
Salary: $100,000 – $150,000 per year (Estimated)
Location: Atlanta, GA
Recruiter / Company – Contact Information
Email: heed.s@twsol.com
Recruiter Notice:
To remove this job posting, please send an email from
heed.s@twsol.com with the subject:
DELETE_JOB_ID_7339