Senior ML Infrastructure Engineer (Research Initiatives) - Systems Integrator
- Super Competitive (Base + Bonus + High-Upside Equity)
- Remote, UK
- Permanent
- Artificial Intelligence
- AI Software
Looking to architect next-generation AI infrastructure that transforms engineering simulations?
Join a world-class AI research laboratory, currently in Series B moving to C, that is redefining engineering with Large Physical Models that replace traditional numerical simulations with high-speed AI inference. The role involves building and managing the infrastructure required to train complex simulation models on hundreds of GPUs, leading the orchestration of massive-scale training environments, and contributing to pioneering Physical AI architectures that go far beyond standard generative models. Engineers will work in a high-growth, NVIDIA-backed environment with exposure to cutting-edge AI hardware and sovereign cloud technologies.
Ready to design and scale infrastructure that powers the future of AI-driven engineering? Apply now.
Responsibilities:
- Design and scale training environments using PyTorch Distributed, JAX, or NVIDIA NeMo across multi-node/multi-GPU clusters.
- Manage a mixed-compute strategy spanning public clouds (AWS/Azure) and sovereign industrial clouds for sensitive data.
- Implement optimization techniques like FSDP and custom kernels to maximize FLOPS for irregular mesh and 3D geometric data.
- Build high-performance pipelines for ingesting CAE/CFD/FEA engineering data, ensuring zero I/O bottlenecks.
- Integrate traditional physics solvers (OpenFOAM/Simcenter) into ML pipelines for active learning and model refinement.
- Setup "physics-aware" CI/CD and experiment tracking (Kubeflow/MLFlow) that validates physical consistency laws.
Skills/Must have:
- Orchestration: Expert-level Kubernetes (AKS/EKS) is essential.
- ML Frameworks: Strong proficiency in PyTorch, JAX, or NVIDIA NeMo.
- HPC/Data: Solid experience with Python, Go, and distributed data tools (Dask/Spark).
- Background: Experience in AI research labs (e.g., DeepMind, OpenAI) or Neocloud environments.
Benefits:
- Competitive Equity: Significant stock option packages in a fast-scaling Series B/C firm.
- Flexible "London-Plus" Setup: Remote-first within Europe/UK, with roughly 1 week per month in London (all travel/accommodation fully paid).
- High-Impact Culture: Work alongside world-renowned physicists, mathematicians, and Formula 1 simulation veterans.
- Sponsorship: Full visa sponsorship available for top-tier global talent.
Salary:
- Super Competitive (Base + Bonus + High-Upside Equity)
- Tailored to attract the best in the industry.