Ready to take the next step in your career?
Join a provider of AI cloud infrastructure delivering full-stack platforms for developers, enterprises, and research institutions to build and deploy generative AI applications. The organisation enables teams to train and run machine learning models in a secure, high-performance, and cost-efficient cloud environment, supporting faster innovation and scientific progress.
The company is seeking a Cloud Infrastructure Engineer to support a hyperscaler platform for GPU-accelerated and AI workloads. The role focuses on improving virtualization and system performance across large-scale infrastructure. The role involves collaboration with specialists in high-performance computing and exposure to technologies such as RDMA, RoCE, Infiniband, and QEMU/KVM within a fast-paced, innovation-driven environment.
Don’t miss out on this exciting opportunity and apply today!
Responsibilities:
- Improve infrastructure supporting GPU-accelerated computing.
- Analyze root causes of performance and reliability issues across various scales and suggest effective solutions.
- Add support for new hardware across the infrastructure software stack.
- Proactively detect and resolve issues to ensure platform stability and efficiency.
Skills/Must Have:
- 5+ years of professional software development experience.
- 3+ years working with Linux systems.
- Strong system-level understanding of server architecture, PCIe devices, NICs, and kernel drivers.
- Proficiency in performance-oriented programming languages (e.g., C, C++, Go, Java, Python).
Desirable Skills:
- Experience tuning performance for HPC workloads.
- Familiarity with RDMA, RoCE, and Infiniband networking.
- Knowledge of Software Defined Networking and HPC cluster networking.
- Understanding of the QEMU/KVM virtualization stack.
- Experience with deep learning frameworks (e.g., PyTorch, TensorFlow).
- Familiarity with collective communication libraries (e.g., MPI, NCCL).
- Willingness to complete a coding interview as part of the hiring process.
Benefits:
- Competitive salary and full benefits package.
- Opportunities for professional growth and internal mobility.
- Hybrid work environment with flexibility.
- Collaborative and forward-thinking engineering culture.
Salary:
- Competitive and based on experience.