
Research Engineer, Training Infrastructure Lead
About Goodfire
Behind our name: Like fire, AI holds the potential for both immense benefit and significant risk. Just as mastering fire transformed human history, we believe the safe and intentional development of AI will shape the future of our species. Our goal is to tame this new fire.
Goodfire is an AI interpretability research company focused on understanding and intentionally designing advanced AI systems. We believe advances in interpretability will unlock the next frontier of safe and powerful foundation models and that deep research breakthroughs are necessary to make this possible.
Everything we do is in service of that mission. We move fast, take ownership, and constantly push to improve. We believe in acting today rather than tomorrow. We care deeply about the success of the organization and put the team above ourselves.
Goodfire is a public benefit corporation headquartered in San Francisco with a team of the world’s top interpretability researchers and engineers from organizations like OpenAI and DeepMind. We’ve raised $57M from investors like Menlo, Lightspeed and Anthropic and work with customers including Arc Institute, Mayo Clinic, and Rakuten.
The role:
We're seeking a senior engineering leader to own and evolve research platform and training infrastructure. You'll define both the technical vision and the implementation strategy for the systems that power our research breakthroughs.
Key responsibilities:
- Design and build customizable training pipelines that scale from experimentation to production
- Architect and implement large-scale model serving infrastructure for interpretability (reference: NDIF, Garcon)
- Identify and execute on opportunities to dramatically accelerate research velocity
- Lead technical decision-making for infrastructure that supports cutting-edge AI research
Who you are:
Goodfire is looking for experienced individuals who embody our values and share our deep commitment to making interpretability accessible. We care deeply about building a team who shares our values:
Put mission and team first
All we do is in service of our mission. We trust each other, deeply care about the success of the organization, and choose to put our team above ourselves.
Improve constantly
We are constantly looking to improve every piece of the business. We proactively critique ourselves and others in a kind and thoughtful way that translates to practical improvements in the organization. We are pragmatic and consistently implement the obvious fixes that work.
Take ownership and initiative
There are no bystanders here. We proactively identify problems and take full responsibility over getting a strong result. We are self-driven, own our mistakes, and feel deep responsibility over what we’re building.
Action today
We have a small amount of time to do something incredibly hard and meaningful. The pace and intensity of the organization is high. If we can take action today or tomorrow, we will choose to do it today.
What we are looking for:
Required experience:
- 5+ years of experience in ML infrastructure, research engineering, and/or systems programming
- Leadership experience as senior architect, tech lead, and/or engineering manager
- Cross-functional expertise bridging research and engineering domains
- Technical proficiency in Python, PyTorch/JAX, and distributed systems
- Production experience deploying and maintaining ML systems at scale
- Mission alignment with advancing AI safety and interpretability
Core competencies
High-ownership leadership
- Owns broad areas with autonomy, driving architectural and strategic decisions even amid uncertainty
- Balances technical depth with speed, adapting as priorities evolve
Research-to-production mindset
- Bridges fast research iteration with reliable, scalable production systems
- Designs abstractions that preserve flexibility while ensuring robustness
Modern ML & infrastructure expertise
- Deep experience in Python, PyTorch, and large-scale training strategies
- Hands-on with end-to-end ML infrastructure: from experiments to serving
- Strong track record of scaling systems and debugging complex runs
Preferred qualifications:
- Contributions to open-source ML infrastructure projects
- Experience in fast-paced startup or research lab environments
Compensation & benefits:
This role offers market competitive salary, equity, and competitive benefits. More importantly, you'll have the opportunity to work on groundbreaking technology with a world-class team on the critical path to ensuring a safe and beneficial future for humanity.
The expected salary range for this position is $200,000 - $400,000 USD.
Create a Job Alert
Interested in building your career at Goodfire? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field