HPC Operations Engineer - Banking and Finance
- $160,000 to $190,000 base per year
- Chicago, United States
- Permanent
- 150000
- Artificial Intelligence
- AI Network
Looking for a role at the forefront of high-performance computing?
Join a global financial technology firm as a hands-on HPC Operations Engineer, working in a fast-paced environment where complex, large-scale Linux HPC systems are at the core of operations. The role involves being the primary point of contact for high-performance computing infrastructure, providing front-line support for a 24/7 environment, and ensuring that an elite research community has reliable, cutting-edge resources. This is a highly technical position for someone who thrives on solving unpredictable operational challenges and takes pride in understanding the intricacies of HPC at scale.
Step into a role where your expertise directly powers high-stakes computing and supports groundbreaking research initiatives. Apply now!
Responsibilities:
- Operational Excellence: Providing front-line support for Linux HPC compute, storage, and RDMA interconnects.
- Problem Solving: Managing the full lifecycle of issues raised by researchers, from initial triage to root cause analysis and resolution.
- Automation & Tooling: Writing code (Python, Go, or C) to automate frequent tasks and building infrastructure to diagnose difficult system bottlenecks.
- Vendor Management: Managing global vendor relationships, including occasional domestic and international travel.
- System Integrity: Implementing performance/fault monitoring and maintaining rigorous cybersecurity standards.
- Maintenance & On-Call: Participating in coordinated maintenance (including evenings/weekends) and a standard on-call rotation to ensure 100% uptime.
Skills/Must have:
- Experience: 2+ years of professional Linux systems experience.
- HPC Knowledge: Exposure to parallel filesystems (Lustre, GPFS), batch schedulers (Slurm, Grid Engine), or high-performance interconnects is a major plus.
- Coding Skills: High proficiency in at least one language (Go, Python, or C) with the ability to pick up new ones quickly.
- Mindset: A strong sense of urgency, the ability to work independently across multiple workstreams, and excellent communication skills.
- Presence: This is an onsite role in Chicago (average 5 days a week). You must be willing to work a weekend maintenance window (Friday evening or Saturday morning).
Benefits:
- Large performance-based bonus
- Full dental, medical, and vision insurance
Salary:
- $160,000 to $190,000 base per year