Staff Software Engineer, AI/ML Infrastructure
The Chan Zuckerberg Initiative was founded by Priscilla Chan and Mark Zuckerberg in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education to addressing the needs of our local communities. Our mission is to build a more inclusive, just, and healthy future for everyone.
The Team
Our Central Tech team provides technology and security support for CZI, the Biohub Network, and our grantees. We believe that Engineering and Security are most effective when in sync and learning from each other on a daily basis. Our AI Infrastructure Engineering team enables our AI Research teams to achieve their goals faster and more securely. We leverage technology to automate manual processes, constantly innovate to optimize operations, provide first-class support, and build solutions to enable the scale and execution of our business partners' strategies and initiatives.
The Opportunity
The AI/ML and Data Engineering Infrastructure organization works on building shared tools and platforms to be used across all of the Chan Zuckerberg Initiative and CZ Biohub, partnering and supporting the work of a wide range of Research Scientists, Data Scientists, AI Research Scientists, as well as a broad range of Engineers focusing on Education and Science domain problems. Members of the central technology’s infrastructure engineering team have an impact on all of CZI's initiatives by enabling the technology solutions used by other engineering teams at CZI to scale. A person in this role will build these technology solutions and help to cultivate a culture of shared best practices and knowledge around AI/ML infrastructure.
What You'll Do
- Lead the design and delivery of secure, scalable, and high-performance AI/ML compute infrastructure.
- Architect and implement containerized AI/ML platforms using Kubernetes for heterogeneous, distributed environments.
- Integrate on-prem (High Performance Compute) and cloud-based AI platforms with GPU clusters to support pre-training, training, fine-tuning, and inference workflows.
- Define and execute systems integration strategies to maximize performance, scalability, and security for AI workloads.
- Enable research teams to effectively use AI platforms through best practices in lifecycle management and deployment.
- Solve complex challenges in scaling AI workflows and optimizing model training and inference pipelines.
What You'll Bring
- BS/MS in Computer Science or related field, or equivalent experience, with 8+ years in coding and systems architecture/design across AI/ML and core infrastructure.
- Proven proficiency in a systems language (C, C++, C#, Go, Rust, Java, Scala) and a scripting language (Python, PHP, Ruby).
- Expertise in cloud platforms (AWS, GCP, Azure) and hybrid environments, including on-premises and colocation hosting.
- Strong experience in AI/ML platform operation technologies (e.g. Slrum, Sunk, Run:ai, Kubeflow)
- Advanced skills in scaling and securing containerized applications on Kubernetes, including custom container development and CI/CD integration.
- Working knowledge of Nvidia CUDA, AI/ML custom libraries, and Linux systems optimization/administration.
Compensation
The Redwood City, CA and New York City, NY base pay range for this role is $270,000.00 - $371,800.00
The Chicago, IL base pay range for this role $230,000.00 - $315,700.00
New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process.
Work Mode
As we grow, we’re excited to strengthen in-person connections and cultivate a collaborative, team-oriented environment. This role is a hybrid position requiring you to be onsite for at least 60% of the working month, approximately 3 days a week, with specific in-office days determined by the team’s manager. The exact schedule will be at the hiring manager's discretion and communicated during the interview process.
Benefits for the Whole You
We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible.
- CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
- Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
- CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
- Paid time off to volunteer at an organization of your choice.
- Funding for select family-forming benefits.
- Relocation support for employees who need assistance moving to the Bay Area
- And more!
If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.
Explore our work modes, benefits, and interview process at www.chanzuckerberg.com/careers.
#LI-Hybrid
Apply for this job
*
indicates a required field