Senior GPU Infrastructure Engineer - AI Infrastructure
- $120,000 – $160,000 base
- Houston, Texas.
- Permanent
- 150000
- Artificial Intelligence
- AI Network
Join a newly funded GPUaaS provider building high-performance AI infrastructure from the ground up in Houston, Texas. This is a hands-on engineering role focused on deploying and operating NVIDIA-powered GPU clusters designed for large-scale AI training and inference workloads.
You’ll be one of the first technical hires on the ground, with direct ownership over GPU networking and cloud infrastructure, shaping how the platform is built, scaled, and operated. The environment is greenfield, fast-moving, and highly technical, with real influence over architectural decisions.
If you want to work daily with cutting-edge NVIDIA GPU stacks, own core infrastructure rather than inherit it, and join early at a well-funded GPUaaS company, this role offers rare depth and impact.
Interested? Get in touch and apply today.
Responsibilities:
- Design, deploy, and operate GPU clusters using NVIDIA networking technologies (InfiniBand, NVLink, RoCE).
- Build and manage GPU cloud infrastructure using OpenStack as the core control plane.
- Configure and maintain high-performance networking for AI workloads, optimising latency, throughput, and reliability.
- Support provisioning, scaling, and lifecycle management of GPU nodes and tenant environments.
- Work closely with hardware, datacenter, and platform teams to ensure tight integration between compute, network, and storage layers.
- Troubleshoot complex performance and connectivity issues across GPU, network, and virtualisation layers.
- Contribute to automation, documentation, and operational best practices.
Skills/Must have:
- Strong hands-on experience with NVIDIA GPU infrastructure and networking stack (InfiniBand, RoCE, NCCL, CUDA-aware networking).
- Solid production experience operating OpenStack environments (Nova, Neutron, Cinder; GPU passthrough a plus).
- Deep Linux systems knowledge in high-performance or large-scale environments.
- Experience supporting AI/ML training or inference workloads at scale.
- Comfortable working on-site 5 days per week in Houston.
- Ability to operate autonomously in an early-stage, high-ownership environment
Benefits:
- Potential equity/bonus
Salary:
- $120,000 – $160,000 base (depending on experience)