NEWPosted 7 hours ago

Job ID: JOB_ID_6338

Job Summary:

Actively focus on delighting customers by minimizing downtime, anticipating needs, and exceeding service expectations. Build trust with customers through professionalism, accountability, and consistent follow-through. Support and troubleshoot Linux-based cloud environments used for silicon design and verification workflows. Diagnose and resolve system-level issues across compute, storage, networking, and identity services. Monitor HPC cluster performance, job throughput, and queue health. Identify and remediate HPC job performance issues, including scheduler configuration, resource contention, I/O bottlenecks, and memory constraints. Troubleshoot and resolve license availability, utilization, and checkout issues impacting customer workloads. Support distributed resource managers (e.g., Slurm, LSF, SGE, or similar technologies). Develop and maintain automation to streamline recurring operational tasks, including system health, performance, and capacity monitoring, and user provisioning and de-provisioning. Use agentic AI, Python, shell scripting, Perl, or similar technologies to reduce manual effort and improve mean time to resolution (MTTR). Operate and support systems containing ITAR-controlled and CUI data in compliance with regulatory and company requirements. Create and maintain runbooks, knowledge base articles, and customer-facing documentation.

Key Responsibilities:

  • Actively focus on delighting customers by minimizing downtime, anticipating needs, and exceeding service expectations.
  • Build trust with customers through professionalism, accountability, and consistent follow-through.
  • Support and troubleshoot Linux-based cloud environments used for silicon design and verification workflows.
  • Diagnose and resolve system-level issues across compute, storage, networking, and identity services.
  • Monitor HPC cluster performance, job throughput, and queue health.
  • Identify and remediate HPC job performance issues, including scheduler configuration, resource contention, I/O bottlenecks, and memory constraints.
  • Troubleshoot and resolve license availability, utilization, and checkout issues impacting customer workloads.
  • Support distributed resource managers (e.g., Slurm, LSF, SGE, or similar technologies).
  • Develop and maintain automation to streamline recurring operational tasks.
  • Use agentic AI, Python, shell scripting, Perl, or similar technologies to reduce manual effort and improve mean time to resolution (MTTR).
  • Operate and support systems containing ITAR-controlled and CUI data in compliance with regulatory and company requirements.
  • Create and maintain runbooks, knowledge base articles, and customer-facing documentation.

Required Skills & Experience:

  • Strong hands-on experience with Linux system administration and troubleshooting.
  • Experience supporting HPC or large-scale compute environments.
  • Proficiency in Python, shell scripting, Perl, or other automation-focused programming languages.
  • Experience with monitoring tools.
  • Ability to work with export-restricted data (ITAR/CUI).
  • Experience supporting EDA, semiconductor, or silicon design environments.
  • Experience applying AI-assisted or autonomous automation in operations.
  • Bachelors degree in Computer Science or equivalent practical experience.

Special Requirements

ITAR/CUI data compliance required. Experience with EDA, semiconductor, or silicon design environments preferred. AI-assisted or autonomous automation experience preferred.


Compensation & Location

Salary: $50 – $70 per year (Estimated)

Location: Boston, MA


Recruiter / Company – Contact Information

Email: muralikaizon@gmail.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
muralikaizon@gmail.com with the subject:

DELETE_JOB_ID_6338

to delete@join-this.com.