NEWPosted 3 hours ago

Job ID: JOB_ID_5635

Job Summary:

We are looking for an experienced AI Ops Senior Technical Architect to lead the architecture, design, and delivery of our AIOps platform. This role involves driving reliability, SLO strategies, and architecting scalable multi-cloud AIOps solutions for enterprise systems. The ideal candidate will have deep expertise in observability, automation, self-healing, event intelligence, and AI-driven operations.

Key Responsibilities:

  • Define the AIOps architecture and roadmap, covering telemetry, analytics, automation, and AI/ML adoption.
  • Architect scalable observability platforms using tools like Open Telemetry, Prometheus/Grafana, ELK/Loki, and Jaeger/Tempo.
  • Lead the design and implementation of event correlation, anomaly detection, root cause analysis (RCA) accelerators, and noise reduction mechanisms.
  • Architect automated remediation workflows, ChatOps automations, and integrations with ITSM tools like ServiceNow/JSM.
  • Integrate with Application Performance Monitoring (APM) tools such as Datadog, Dynatrace, Splunk, New Relic, and AppDynamics to build unified AIOps pipelines.
  • Drive the adoption of Service Level Objectives (SLOs)/Service Level Indicators (SLIs) frameworks, error budgets, and reliability engineering across services.
  • Own the multi-cloud architecture (AWS/Azure/GCP), Kubernetes platform patterns, and Infrastructure as Code (IaC) standards.
  • Ensure security, compliance, Identity and Access Management (IAM), data governance, and high availability architecture for all AIOps components.
  • Mentor engineering teams, review designs, lead incident reviews, and ensure platform scalability and cost efficiency.

Required Skills:

  • AIOps & Observability: Deep expertise in Open Telemetry, distributed tracing, metrics/logs pipelines. Strong understanding of AIOps signals including anomaly detection, pattern mining, and event correlation.
  • Architecture: Expertise in distributed systems, microservices, API design, high availability, and performance engineering. Experience designing real-time streaming pipelines using Kafka/Kinesis/Event Hub.
  • APM & Monitoring Tools: Hands-on experience with Datadog, Dynatrace, Splunk, New Relic, AppDynamics, Moogsoft, or BigPanda.
  • Automation & Self-healing: Proficiency in Python/Go for automation, serverless computing, runbooks, workflow engines (Airflow/Temporal), and ChatOps bots.
  • Cloud & Platform: Strong knowledge of AWS/Azure/GCP architecture, Kubernetes (EKS/AKS/GKE), Terraform, GitOps, and CI/CD practices.
  • Security & Governance: Solid understanding of RBAC, IAM/KMS, encryption, auditability, and compliance standards (SOC2/ISO).
  • Experience: 10-15 years in engineering, with at least 5 years in SRE, Platform, Observability, or AIOps architecture. Proven track record of designing and delivering enterprise-scale AIOps or Observability platforms.

Additional Information:

This is a contract position located in Richardson, TX. Candidates are expected to be onsite 3 days a week.


Special Requirements

Candidate should go to the office 3 days a Week


Compensation & Location

Salary: $70 – $90 per year

Location: Richardson, TX


Recruiter / Company – Contact Information

Email: hanshaik1407@gmail.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
hanshaik1407@gmail.com with the subject:

DELETE_JOB_ID_5635

to delete@join-this.com.