NEWPosted 7 hours ago

Job ID: JOB_ID_8668

Job Description: Sr. Site Reliability Engineer

We are looking for a highly experienced Senior Site Reliability Engineer with over 15 years of IT experience, including 8+ years specifically in Site Reliability Engineering practices. This role requires a deep understanding of microservices, Kubernetes, Docker, AWS Cloud, and application servers like Oracle, IBM, and Tomcat. Proficiency in monitoring tools such as New Relic is essential, along with experience in troubleshooting JVM failures, JDBC connection leaks, and service integration issues.

The ideal candidate will have a strong grasp of business flows, customer experience metrics (KPIs and SLAs), and logging frameworks like Elastic/OpenSearch, Logstash, and Kibana. Experience in the Telecom domain is a plus. You will be responsible for ensuring the reliability of applications, defining SLAs, SLIs, and SLOs, and performing capacity planning and performance tuning.

  • Role: Sr. Site Reliability Engineer
  • Location: Miami, FL (5 days onsite)
  • Duration: Long Term Contract
  • Experience: 15+ Years
  • Visa Sponsorship: H4-EAD/L2-EAD/TN/USC/OPT EADs Only, open for C2C
  • Required Skills:
    • 15+ years of experience in IT
    • 8+ Years of experience in Site Reliability Engineering Practices
    • Strong understanding of Microservices, KUBERNETES, DOCKER, AWS CLOUD, Oracle/IBM/Tomcat application servers, NewRelic
    • Good understanding of Business flows, Customer Experience, KPIs and SLAs
    • Proficiency in Logging frameworks and tools like Elastic/OpenSearch, Logstash and Kibana
    • Experience in troubleshooting JVM failures, JDBC connection leaks and service integration failures
    • Experience with Application Monitoring tools like New Relic (Dashboards creation)
    • Experience working in Telecom Domain (preferred)
    • In-depth knowledge of configuring, tuning, and maintaining Java application servers and microservices on Kubernetes platform
    • Strong understanding of SDLC
    • Experience working on CI/CD pipelines using FlexDeploy, Jenkins, Artifactory etc.
    • Working experience on Web Servers, Application Servers, Java Messaging services (JMS Queues & topics) and containerized microservices
    • Good understanding on Kubernetes platform and Service Mesh like ISTIO, NGINX, etc.
    • Hands-on Experience on AWS services like EC2, ALB, NLB, RabbitMQ
    • Responsible for Application’s reliability and defining SLA, SLI and SLO
    • Capacity planning, JDBC tuning and performance tuning
    • Ability to provide requirements and analyze performance and chaos test results
    • Strong Experience and understanding on SOAP and REST Webservices
    • Strong Log analysis skills (System Errors vs Business Errors)
    • Assess and implement best practices for Observability and tracing
    • Strong Incident Management and Problem Management Skills
    • Strong knowledge on Load Balancers, HTTP/HTTPS protocols, and Networking concepts
    • Collaborate with multiple teams for Incident resolution

This role involves working with cutting-edge technologies and requires a proactive approach to maintaining and improving system reliability. You will collaborate with various teams to resolve incidents and ensure the smooth operation of critical applications. The position is a long-term contract based in Miami, FL, requiring 5 days onsite.


Special Requirements

H4-EAD/L2-EAD/TN/USC/OPT EADs Only, open for C2C, 5 days onsite


Compensation & Location

Salary: $130,000 – $190,000 per year (Estimated)

Location: Miami, FL


Recruiter / Company – Contact Information

Email: ad.s@logicplanet.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
ad.s@logicplanet.com with the subject:

DELETE_JOB_ID_8668

to delete@join-this.com.