GPU Cluster Architect - Technology and Cloud Infrastructure Provider

1651455 Posted: 16/01/2026

Up to €200,000 gross per year
Amsterdam [Netherlands]
Permanent
200000
Artificial Intelligence
AI Network

Our client is a global technology and cloud infrastructure provider specialising in high-performance platforms designed to support AI and machine-learning workloads. Operating as a next-generation cloud service provider, they deliver large-scale compute, GPU-accelerated environments and managed services that enable organisations to build, train and deploy advanced applications at scale. With a strong international footprint across Europe and North America, the business combines cutting-edge infrastructure with developer-focused tools to provide secure, scalable and cost-effective access to AI-ready cloud solutions.

We’re looking for a GPU Cluster Architect to join them and lead the design and development of next-generation AI infrastructure powering large-scale, GPU-accelerated workloads. In this hands-on role, you’ll own architectural decisions across compute, networking, and storage, building platforms capable of supporting the scale, performance, and reliability demands of modern AI and ML systems.

You’ll define how tens of thousands of GPUs are interconnected, powered, cooled, and optimized across multiple data center sites. Working alongside world-class engineering teams, you’ll shape the backbone of one of the most advanced AI clouds in the world.

If you’re passionate about designing ultra-scale systems, optimizing performance for LLM training and inference, and building the core infrastructure that powers AI innovation, this is your opportunity.

Responsibilities:

Architect scalable GPU cluster topologies spanning compute nodes, interconnects (InfiniBand, Ethernet), storage, and control planes
Model and analyze AI/ML workloads (LLM training, inference) to drive tradeoffs in latency, bandwidth, GPU density, and performance
Collaborate with network architects to design and validate low-latency, high-throughput interconnects (InfiniBand HDR/NDR, RoCEv2) at POD and data center scale
Integrate and optimize storage solutions to support training datasets, checkpointing, and high-performance I/O operations
Design for reliability, incorporating telemetry, automation, and monitoring to detect and resolve issues early
Partner with cross-functional teams including SRE, networking, storage, and data center engineering to operationalize your designs

Skills / Must Have:

5+ years of experience designing GPU or HPC clusters at scale
Deep understanding of modern GPU architectures (NVIDIA, AMD)
Expertise with HPC interconnects (InfiniBand, RoCE) and low-latency networking
Strong background in systems architecture, compute, and hardware reliability
Proficiency in scripting and automation (Python, Go)

Bonus If You Have:

Experience with AI/ML workload optimization and performance modeling
Familiarity with large-scale data center design and cooling/power strategies
Exposure to orchestration systems (Kubernetes, Slurm) or telemetry frameworks

Benefits:

Bonus scheme
Company shares
Flexible remote working

Salary:

Up to €200,000 gross per year

Holly Staff Principal Network Consultant BLX

Apply for this role

First Name

Last Name

Telephone Number

Email Address

CV, LinkedIn or Dropbox URL

CV Upload

Choose File

LinkedIn / Dropbox URL

Message

By submitting this form you agree to our Terms & Conditions, Privacy Policy & Cookie Policy.

Quick CV Dropoff

GPU Cluster Architect - Technology and Cloud Infrastructure Provider

Apply for this role

Featured Jobs

Contact Us

Find us on social

Useful Links

Legal

GPU Cluster Architect - Technology and Cloud Infrastructure Provider

Apply for this role

Featured Jobs

Contact Us

Find us on social

Useful Links

Legal

Sign up to our newsletter