AI/ML Infrastructure Engineer (Training & Inference) - Hosting

1713866
  • Up to €120,000 basic salary + equity + benefits depending on experience
  • Helsinki, Finland
  • Permanent
  • 100000
  • Artificial Intelligence
  • AI Software


Looking for a role with plenty of growth opportunities?

Join a rapidly scaling AI cloud infrastructure provider building a next-generation GPU platform designed for AI training, experimentation, and inference at scale. The company is developing a fully featured AI cloud platform powered by renewable energy and is already operating with strong momentum across Europe, while now significantly expanding its footprint in the United States.

This company is looking for an AI/ML Infrastructure Engineer to help optimize large scale training and inference workloads across next generation GPU environments. The role sits at the intersection of AI infrastructure, platform engineering and customer facing optimization work, supporting some of the most advanced AI workloads running in Europe. 

Don’t miss out on this exciting opportunity and apply today!


Responsibilities:

  • Work closely with AI/ML customers to optimise large scale training and inference workloads.
  • Support deployment, troubleshooting and performance tuning across GPU heavy AI environments.
  • Build and improve internal ML platforms running on Kubernetes.
  • Support job scheduling, workflow orchestration and distributed training infrastructure.
  • Improve inference platforms including model packaging, serving frameworks and latency optimisation.
  • Optimise GPU utilisation, networking and overall workload efficiency.
  • Support technologies such as vLLM, TensorRT-LLM, Triton, Ray, Flyte or Slurm.
  • Troubleshoot performance bottlenecks across CUDA, storage, networking and distributed systems.
  • Translate customer and engineering requirements into scalable platform improvements.
  • Work closely with infrastructure, platform and software engineering teams to improve reliability and performance across the AI cloud platform.


Skills/Must have:

  • Strong experience within AI/ML infrastructure, HPC or platform engineering environments.
  • Hands on experience supporting model training, fine tuning or inference workloads at scale.
  • Strong Python skills and experience with Linux environments.
  • Experience with PyTorch or JAX.
  • Good understanding of Kubernetes, containers and distributed systems.
  • Experience debugging GPU performance issues across CUDA, drivers, networking or storage.
  • Experience with CI/CD, GitOps or infrastructure automation workflows.
  • Knowledge of GPU networking and performance technologies such as InfiniBand, NCCL or NVLink is highly beneficial.
  • Experience with inference frameworks such as vLLM, Triton or TensorRT-LLM is advantageous.
  • Strong communication skills with the ability to work directly with highly technical customers and engineering teams


Benefits:

  • Shares scheme


Salary:

  • Up to €120,000 basic salary + equity + benefits depending on experience
Holly Staff Head of AI & Data Center Benelux

Apply for this role