Senior Inference Engineer - AI Infrastructure

1677782 Posted: 26/03/2026

$250,000 gross per year
San Francisco, California, United States
Permanent
250000
Artificial Intelligence
AI Software

Ready to architect AI infrastructure that powers next-generation research and cloud platforms?

Join a stealth-mode hyperscale data center startup building a next-generation AI and cloud platform designed for startups and advanced research, powered by thousands of H100, H200, and B200 GPUs available on demand.

The team is now building a serverless inference platform, beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to join as a Senior Inference Platform Engineer at an early stage and help define the architecture, scalability, and technical direction of that platform.

Build resilient, scalable AI platforms that empower startups and innovation. Apply today!

Key Responsibilities:

Take ownership of the inference platform architecture, from batch to low-latency workloads.
Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts.
Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT-LLM across multiple model types.
Develop APIs, orchestration layers, and autoscaling logic to support both multi-tenant and dedicated deployments.
Collaborate with cross-functional teams to translate business and customer needs into robust technical solutions.
Stay up to date with the latest models, serving frameworks, and optimisation techniques, applying best practices in performance and efficiency.
Implement monitoring, alerting, and observability workflows for production systems.

Requirements:

5+ years’ experience building large-scale, fault-tolerant distributed systems (ML inference, HPC, or similar).
Proficiency in Python, Go, Rust, or a comparable language.
Strong understanding of GPU software stacks (CUDA, Triton, NCCL) and Kubernetes orchestration.
Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT-LLM, or custom PyTorch deployments.
Knowledge of performance optimisation techniques, including batching, speculative decoding, quantisation, and caching.
Familiarity with Infrastructure-as-Code tools (Terraform, Helm) and low-level OS performance tuning.

Preferred Skills:

Experience with event-driven or serverless architectures.
Exposure to hybrid cloud or multi-cluster environments.
Contributions to open-source ML or inference systems projects.
Proven track record of cost optimisation in high-performance compute environments.

Benefits:

IPO Equity

Salary:

$250,000 gross per year

Ben Davies Director Global AI Infrastructure

Apply for this role

First Name

Last Name

Telephone Number

Email Address

CV, LinkedIn or Dropbox URL

CV Upload

Choose File

LinkedIn / Dropbox URL

Message

By submitting this form you agree to our Terms & Conditions, Privacy Policy & Cookie Policy.

Quick CV Dropoff

Senior Inference Engineer - AI Infrastructure

Apply for this role

Featured Jobs

Contact Us

Find us on social

Useful Links

Legal

Senior Inference Engineer - AI Infrastructure

Apply for this role

Featured Jobs

Contact Us

Find us on social

Useful Links

Legal

Sign up to our newsletter