Infrastructure Engineer (AI) - Hosting

1674715
  • S$200,000 base per year
  • Adam Park, Singapore
  • Permanent
  • 200000
  • Artificial Intelligence
  • AI Network
  • AI Software


Looking to advance your career in AI and high-performance computing while working with next-generation GPU infrastructure?

Join a technology team that provides scalable GPU computing solutions and global infrastructure for AI and compute-intensive workloads. The team focuses on simplifying access to high-performance systems, allowing engineers to deploy, manage, and optimize resources efficiently across multiple environments. Team members work on real-world projects, collaborating closely with experienced professionals in a fast-paced, innovative environment. This role offers unparalleled exposure to the latest AI technologies, the chance to work with industry-leading customers, and the ability to make a tangible impact on the future of AI.

Apply now to grow your expertise and play a key role in shaping the future of GPU infrastructure and AI computing!


Responsibilities:

  • Get AI Platform customers production-ready on the platform, standing up Kubernetes clusters, configuring GPU drivers, validating networking, and troubleshooting the issues that surface when real workloads hit real hardware.
  • Own the bare metal platform layer (NCCL, InfiniBand, NVLink, storage) with orchestration layers (Kubernetes, SLURM) and MLOps tooling that customers actually use.
  • Configure, benchmark, and debug NVIDIA driver stacks, firmware versions, CUDA compatibility, NCCL tuning, MIG configurations. 
  • Run quality benchmarks and diagnostics to validate performance for inference and training workloads across chip types.
  • Identify gaps before customers do, pressure-testing the infrastructure, APIs, and workflows to find what's missing or broken.
  • Turn customer learnings into product, working with Product and Engineering to build reusable templates, default configurations, and automated workflows that eliminate manual onboarding.
  • Advise customers on chip selection and tokenomics, helping AI platform customers understand price/performance trade-offs across GPU types, cost-per-token economics, and which hardware fits their inference or training workloads.


Skills/Must have:

  • Bare metal Linux depth: experience administering GPU servers at the metal: driver stacks, kernel tuning, firmware, storage configuration. 
  • NVIDIA GPU stack expertise: drivers, CUDA, NCCL, NVLink, nvidia-smi profiling. 
  • Good understanding of how stack compatibility affects performance.
  • Kubernetes and orchestration: production experience with K8s, SLURM, or similar. You know how to stand up clusters, not just deploy to them.
  • AI Networking fundamentals: TCP/IP, VLANs, bonding, and high-speed interconnects (InfiniBand, RoCE) for distributed workloads.
  • Customer-facing communication: work directly with engineers at AI platform companies, understand their constraints, and translate that into clear requirements for your team.
  • Bias toward scalable solutions: you'd rather build a feature that helps 10 customers than a custom deployment that helps


Benefits: 

  • Comprehensive health, dental, and vision insurance, 401 (k) with employer matching
  • 10/15 % bonus
  • Equity options


Salary: 

  • S$200,000 base per year
Ben Davies Director Global AI Infrastructure

Apply for this role