HPC Operations Engineer - Banking and Finance

1675292
  • $160,000 to $190,000 base per year
  • Chicago, United States
  • Permanent
  • 150000
  • Artificial Intelligence
  • AI Network


Looking for a role at the forefront of high-performance computing?

Join a global financial technology firm as a hands-on HPC Operations Engineer, working in a fast-paced environment where complex, large-scale Linux HPC systems are at the core of operations. The role involves being the primary point of contact for high-performance computing infrastructure, providing front-line support for a 24/7 environment, and ensuring that an elite research community has reliable, cutting-edge resources. This is a highly technical position for someone who thrives on solving unpredictable operational challenges and takes pride in understanding the intricacies of HPC at scale.

Step into a role where your expertise directly powers high-stakes computing and supports groundbreaking research initiatives. Apply now!


Responsibilities:

  • Operational Excellence: Providing front-line support for Linux HPC compute, storage, and RDMA interconnects.
  • Problem Solving: Managing the full lifecycle of issues raised by researchers, from initial triage to root cause analysis and resolution.
  • Automation & Tooling: Writing code (Python, Go, or C) to automate frequent tasks and building infrastructure to diagnose difficult system bottlenecks.
  • Vendor Management: Managing global vendor relationships, including occasional domestic and international travel.
  • System Integrity: Implementing performance/fault monitoring and maintaining rigorous cybersecurity standards.
  • Maintenance & On-Call: Participating in coordinated maintenance (including evenings/weekends) and a standard on-call rotation to ensure 100% uptime.


Skills/Must have:

  • Experience: 2+ years of professional Linux systems experience.
  • HPC Knowledge: Exposure to parallel filesystems (Lustre, GPFS), batch schedulers (Slurm, Grid Engine), or high-performance interconnects is a major plus.
  • Coding Skills: High proficiency in at least one language (Go, Python, or C) with the ability to pick up new ones quickly.
  • Mindset: A strong sense of urgency, the ability to work independently across multiple workstreams, and excellent communication skills.
  • Presence: This is an onsite role in Chicago (average 5 days a week). You must be willing to work a weekend maintenance window (Friday evening or Saturday morning).


Benefits:

  • Large performance-based bonus 
  • Full dental, medical, and vision insurance


Salary:

  • $160,000 to $190,000 base per year
Ben Davies Director Global AI Infrastructure

Apply for this role