Site Reliability Engineer - AI

1704941
  • Competitive salary and comprehensive benefits package.
  • San Francisco, California, United States
  • Permanent
  • Artificial Intelligence
  • AI Data Center
  • AI Software


Are you looking for an exciting opportunity?

Join a specialist technology provider delivering advanced provisioning, management, and security solutions for data centers. The organization helps operators enhance customer experience, streamline day-to-day operations, and stay ahead of the competition through innovative products and services, allowing them to focus on their core strengths in hardware and infrastructure.

Your next opportunity starts here—apply today.


Responsibilities:

  • Install and integrate Hydra’s Brokkr software with new datacenters and onboarded servers 
  • Maintain integrated datacenter and inventory, respond to L2 and L3 requests and alerts, and improve monitoring and other supporting infrastructure 
  • Monitor system performance and uptimes, ensuring the highest level of systems and infrastructure availability. 
  • Liaise with vendors and other IT personnel for problem resolution. 
  • Install, configure, test, and maintain operating systems, application software, and system management tools. 
  • Maintain security, backup, and redundancy strategies. 
  • Write and maintain custom scripts to increase system efficiency and lower human intervention time on any tasks. 
  • Participate in the design of information and operational support systems. 


Required Skills/Qualifications:

  • BS/MS degree in Computer Science, Engineering, or a related subject. Equivalent experience accepted.  
  • Proven working experience in installing, configuring, and troubleshooting UNIX/Linux-based environments. 
  • Solid experience in the administration and performance tuning of application stacks (e.g., Apache, MySQL, NGINX). 
  • Experience with virtualization and containerization (e.g., QEMU/KVM, Docker). 
  • Experience with monitoring systems (e.g., Nagios, Zabbix). 
  • Experience with automation software (e.g., Puppet, Chef, Ansible). 
  • Solid scripting skills (e.g., shell scripts, Perl, Ruby, Python). 
  • Solid networking knowledge (OSI network layers, TCP/IP, DNS, DHCP). 


Desirable Skills:

  • Certification in relevant fields (e.g., Linux Certifications, Cisco Certified Network Associate - CCNA, Microsoft Certified Systems Engineer - MCSE) are a plus. 
  • Experience with cloud services (AWS, Microsoft Azure) is a plus. 
  • Strong problem-solving skills and the ability to work under pressure is a must. 
  • Strong communication skills and the ability to collaborate and be proactive in asking questions is a must.  


Benefits:

  • Flexible working hours and remote work opportunities. 
  • A supportive team environment with an emphasis on learning and growth. 
  • Access to cutting-edge technology and tools.


Salary:

  • Competitive salary and comprehensive benefits package.
Ben Davies Director Global AI Infrastructure

Apply for this role