Full-timeInfrastructure

Material Design Icons

Infrastructure Engineer

United States - Remote

Posted

1mo ago

Type

Full-time

Location

United States - Remote

Job Overview

About the role Own the platform that powers our accelerator cloud. Your scope spans bare-metal provisioning, multi-tenant Kubernetes, SLURM scheduling, control planes, and the automation and observability that keep thousands of compute nodes running as a single production system. What you'll do • Build the control plane and APIs that unify our compute fleet • Own provisioning and lifecycle from rack bring-up to node retirement • Operate the scheduling layer for training and inference workloads • Architect multi-tenancy: isolation, quota, fairness, and accounting • Build automation that eliminates manual operations • Drive reliability, observability, and incident response across the fleet What you'll need • BS in CS, EE, or related field, or equivalent experience • 5+ years in infrastructure, platform, or backend engineering • Advanced software engineering skills: Rust, Go, or Python • Deep understanding of Linux, storage, and distributed systems • Experience with workload schedulers: SLURM, Kubernetes scheduling, or equivalent • Expertise with automation tooling: Terraform, Ansible, Helm • Experience architecting multi-tenant systems • Production SRE experience: on-call, incident response, observability What we offer • Top-tier compensation structured to recognize and retain the best talent • Meaningful equity • Comprehensive medical, dental, vision, life, and disability insurance • Parental leave for all new parents, including adoptive and surrogate journeys • Flexible PTO • Paid Holidays • Relocation support   Equal Employment Opportunity We're an Equal Opportunity Employer and do not discriminate on the basis of any protected status under applicable law.

Core Requirements

Infrastructure