Storage Engineer - Hosting
- $200,000 base salary
- Miami, Florida, United States
- Permanent
- 200000
- Artificial Intelligence
- AI Data Center
- AI Network
- AI Software
Ready to help build the backbone of next-generation AI?
Join a Founders Fund-backed NVIDIA cloud partner that is creating the high-performance infrastructure powering some of the world’s most ambitious AI research. In the realm of GPU-as-a-Service, the bottleneck isn’t compute, it’s the data.
As a Storage Engineer, design and implement the data layer that enables foundation model training and enterprise-grade production inference. This role requires a deep understanding that AI at scale demands more than capacity: it demands massive throughput, ultra-low latency, and the ability to feed thousands of GPUs seamlessly.
Take the next step in your career and help shape the infrastructure that drives the future of AI.
Responsibilities:
- Design & Deploy AI Storage: Architect and implement high-performance parallel file systems (Weka, Lustre, or similar) optimised specifically for GPU-heavy workloads and multi-node training.
- Optimise Data Pipelines: Fine-tune storage performance to ensure maximum GPUDirect Storage (GDS) efficiency, minimising latency between the storage fabric and the GPU memory.
- Manage Scale & Reliability: Build and maintain petabyte-scale storage clusters across multiple global data centers, ensuring 99.99% uptime for mission-critical AI research labs.
- Infrastructure Integration: Partner with Network and Data Center engineers to configure high-speed storage networking (InfiniBand/400G Ethernet) and ensure seamless backend connectivity.
- Automate Storage Ops: Develop Terraform providers, Ansible playbooks, or Python scripts to automate the provisioning, monitoring, and scaling of storage resources.
- Troubleshoot Complex I/O: Act as the Tier-3 lead for storage-related performance degradation, identifying root causes in the filesystem, network, or Linux kernel.
Skills/Must have:
- Specialised Storage Expertise: 5+ years of experience with high-performance storage solutions (WekaIO, VAST Data, BeeGFS, or DDN) in a Linux-heavy environment.
- AI Infrastructure Knowledge: Deep understanding of how storage interacts with NVIDIA GPU stacks (HGX/DGX) and the specific I/O patterns of ML training (checkpoints, small file reads, etc.).
- Networking Proficiency: Hands-on experience with InfiniBand, RoCEv2, and NVMe-over-Fabrics (NVMe-oF).
- Systems Automation: Strong scripting skills in Python, Go, or Bash, and experience with IaC tools like Terraform or Pulumi.
- Linux Internals: Deep knowledge of the Linux storage stack, including XFS/ZFS, LVM, and kernel tuning for high-throughput networking.
Benefits:
- 10% bonus
- Stock options
Salary:
- $200,000 base salary