NEWPosted 4 hours ago
Job ID: JOB_ID_5160
Job Title: Network Reliability Engineer (NRE)
We are looking for a highly skilled Network Reliability Engineer (NRE) with a minimum of 14 years of experience to join our team. This role focuses on ensuring the ongoing health, observability, and operational stability of the network fabric.
Core Focus:
- Ongoing health, observability, and operational stability of the network fabric.
- Network Context: Leaf-spine architecture (currently VLAN-based segmentation) with a mix of low-speed control/management networks and high-speed data acquisition paths. Future consideration for RDMA / RoCE and potential evolution toward L3 routed fabric at larger scale (beta phase).
Key Responsibilities:
- Build post-handoff network observability.
- Implement and enhance telemetry, advanced monitoring, and health dashboards.
- Detect and alert on fiber/optic degradation, port failures, link state changes, and environmental impacts (construction activity, cable damage).
- Work with Spectrum-X telemetry and advanced QoS policies.
- Establish operational workflows: detection, alerting, remediation.
- Maintain fabric stability after PS team exits.
- Support integration troubleshooting across compute, storage, and network components.
Requirements:
- Minimum 14 years of experience.
- Containers: Kubernetes, Docker, etc.
- Automation Tools: Grafana, Ansible.
- Cluster Management.
- Open-source data tools: Kafka.
- Cloud Databases: AWS Databases.
- Linux.
- HPC related tools.
Additional Information:
- Location: Milpitas, CA (On-site).
- Visa Sponsorship: Green Card and US Citizen ONLY.
Special Requirements
Visa: GC AND USC ONLY. Location: Milpitas, CA On site.
Compensation & Location
Salary: $120,000 – $170,000 per year (Estimated)
Location: Milpitas, CA
Recruiter / Company – Contact Information
Email: a.gupta@rulesiq.com
Recruiter Notice:
To remove this job posting, please send an email from
a.gupta@rulesiq.com with the subject:
DELETE_JOB_ID_5160