Senior Software Engineer, AI Infrastructure
About Us
At 3Y Health, we are building AI-driven software to empower healthcare providers and solve the overwhelming administrative complexity that consumes 40% of the industry’s revenue. Our end-to-end platform unlocks opportunities for clinician entrepreneurs, enabling medical professionals to launch, run, and grow private practices. By supporting these independent practices with the latest AI and automation, we’re helping providers reclaim their time, build thriving businesses, and deliver better outcomes for their communities. 3Y Health is backed by over $200M from top-tier investors including Founders Fund, General Catalyst, Softbank, and 8VC.
About the Role
We're seeking a Senior Software Engineer, AI Infrastructure to architect and scale our internal machine learning platform. Unlike traditional ML Engineers, your focus will not be on developing models—but on building and optimizing the infrastructure that supports them. You’ll collaborate closely with researchers, engineers, and product teams to support training, evaluation, and deployment workflows, ensuring our systems are robust, scalable, and efficient. This role will be hybrid with the expectation of 3 days in the office.
Responsibilities
- Design and maintain distributed systems for ML model training and evaluation.
- Build and scale ML pipelines using frameworks like PyTorch, Apache Spark, and Kubernetes.
- Optimize compute performance, GPU utilization, and data throughput using technologies like CUDA, NVIDIA Triton, and low-level tuning.
- Develop infrastructure abstractions to standardize model experimentation and deployment.
- Partner with ML researchers and product teams to ensure the infra stack aligns with evolving ML needs.
- Establish robust monitoring, profiling, and logging tools for model performance and infrastructure reliability.
- Contribute to infrastructure codebases and tooling libraries used across the ML lifecycle.
Qualifications
- 3+ years of software engineering experience, with at least 2+ focused on ML infrastructure or distributed systems.
- Deep familiarity with ML frameworks and tools (e.g., PyTorch, TensorFlow, Spark, Ray, NVIDIA stack, etc.).
- Production experience with Kubernetes, container orchestration, and scaling training workloads across clusters.
- Strong understanding of GPU architecture and performance optimization with CUDA or similar technologies.
- Practical experience in building end-to-end ML pipelines, from training to serving, but with an infra-first mindset.
- Comfortable writing clean, scalable code in Python, Go, or a systems language (e.g., C++, Rust).
- Not focused on DevOps/SRE work—this is a deeply technical infra role rooted in ML systems.
Compensation
The estimated salary range for this role is $175,000 - $205,000. Total compensation for this position may also include stock options. Note that total compensation for this position will be determined by each individual’s relevant qualifications, work experience, skills, and other factors.
Apply for this job
*
indicates a required field