Storage Engineer - Hosting

1675173
  • $200,000 base salary
  • Miami, Florida, United States
  • Permanent
  • 200000
  • Artificial Intelligence
  • AI Data Center
  • AI Network
  • AI Software

Ready to help build the backbone of next-generation AI?

Join a Founders Fund-backed NVIDIA cloud partner that is creating the high-performance infrastructure powering some of the world’s most ambitious AI research. In the realm of GPU-as-a-Service, the bottleneck isn’t compute, it’s the data.

As a Storage Engineer, design and implement the data layer that enables foundation model training and enterprise-grade production inference. This role requires a deep understanding that AI at scale demands more than capacity: it demands massive throughput, ultra-low latency, and the ability to feed thousands of GPUs seamlessly.

Take the next step in your career and help shape the infrastructure that drives the future of AI.


Responsibilities:

  • Design & Deploy AI Storage: Architect and implement high-performance parallel file systems (Weka, Lustre, or similar) optimised specifically for GPU-heavy workloads and multi-node training.
  • Optimise Data Pipelines: Fine-tune storage performance to ensure maximum GPUDirect Storage (GDS) efficiency, minimising latency between the storage fabric and the GPU memory.
  • Manage Scale & Reliability: Build and maintain petabyte-scale storage clusters across multiple global data centers, ensuring 99.99% uptime for mission-critical AI research labs.
  • Infrastructure Integration: Partner with Network and Data Center engineers to configure high-speed storage networking (InfiniBand/400G Ethernet) and ensure seamless backend connectivity.
  • Automate Storage Ops: Develop Terraform providers, Ansible playbooks, or Python scripts to automate the provisioning, monitoring, and scaling of storage resources.
  • Troubleshoot Complex I/O: Act as the Tier-3 lead for storage-related performance degradation, identifying root causes in the filesystem, network, or Linux kernel.


Skills/Must have:

  • Specialised Storage Expertise: 5+ years of experience with high-performance storage solutions (WekaIO, VAST Data, BeeGFS, or DDN) in a Linux-heavy environment.
  • AI Infrastructure Knowledge: Deep understanding of how storage interacts with NVIDIA GPU stacks (HGX/DGX) and the specific I/O patterns of ML training (checkpoints, small file reads, etc.).
  • Networking Proficiency: Hands-on experience with InfiniBand, RoCEv2, and NVMe-over-Fabrics (NVMe-oF).
  • Systems Automation: Strong scripting skills in Python, Go, or Bash, and experience with IaC tools like Terraform or Pulumi.
  • Linux Internals: Deep knowledge of the Linux storage stack, including XFS/ZFS, LVM, and kernel tuning for high-throughput networking.


Benefits:

  • 10% bonus
  • Stock options


Salary:

  • $200,000 base salary
Ben Davies Director Global AI Infrastructure

Apply for this role