Principal Software Engineer, ML Platform (Stability & Infrastructure)
Isomorphic Labs is applying frontier AI to help unlock deeper scientific insights, faster breakthroughs, and life-changing medicines with an ambition to solve all disease.
The future is coming. A future enabled and enriched by the incredible power of machine learning. A future in which diseases are curtailed or cured starting with better and faster drug discovery.
Come and be part of an interdisciplinary team driving groundbreaking innovation and play a meaningful role in contributing towards us achieving our ambitious goals, while being a part of an inspiring and collaborative culture.
The world we want tomorrow is the one we’re building today. It starts with the culture at this company. It starts with you.
About Iso
Isomorphic Labs (IsoLabs) was launched in 2021 to advance human health by building on and beyond the Nobel-winning AlphaFold system. Since then, our interdisciplinary team of drug discovery experts and machine learning specialists has built powerful new predictive and generative AI models that accelerate scientific discovery at digital speed.
Our name comes from the belief that there is an underlying symmetry between biology and information science. By harnessing AI’s powerful capabilities, we can use it to model complex biological phenomena to help design novel molecules, anticipate how drugs will perform and develop innovative medicines to treat and cure some of the world’s most devastating diseases.
We have built a world-leading drug design engine comprising AI models that are capable of working across multiple therapeutic areas and drug modalities. We are continually innovating on model architecture and developing cutting-edge capabilities to advance rational drug design.
Every day, and with each new breakthrough, we’re getting closer to the promise of digital biology, and achieving our ambitious mission to one day solve all disease with the help of AI.
Principal Software Engineer, ML Platform (Stability & Infrastructure)
Your Impact
We are building the largest foundation models in biotech and applying them immediately to cure disease. You will play a pivotal role in ensuring the reliability and scalability of the foundations that make this possible.
As a Principal Engineer, you will lead the efforts to harden our systems, ensuring our groundbreaking AI is built on an unshakeable base, working closely with the research team and the Applied ML teams to ensure the infrastructure is stable, reliable and can operate with more data and larger models as we grow.
What You Will Do
- You will own the end-to-end strategy for platform reliability, with a specific focus on our accelerator (GPU/TPU) infrastructure and workload orchestration. You will move between high-level architectural design and hands-on systems engineering to eliminate friction in the researcher experience.
- Lead the reliability work for our global job scheduler . You will design and implement a robust "test harness" to safely validate infrastructure upgrades without impacting live research.
- Architect and optimize our next-generation inference services. You will solve core scaling limits, ensuring high-throughput performance and feature parity across our model serving stack.
- Overhaul our logging and monitoring systems to provide radical visibility. You will build proactive alerting and telemetry that identifies systemic failures before they impact research workflows.
- Improve our internal CI/CD stability, targeting a significant reduction in failure rates and significantly faster feedback loops for the engineering organization.
- Contribute to core technical decisions on tooling and architectural design while partnering with science, product, and operations teams to align infrastructure with biotech R&D cycles.
Skills and Qualifications
Essential:
- Proven experience in architecting and managing large-scale AI/ML workloads in a production environment.
- Expertise in cloud compute design, specifically within Google Cloud Platform (GCP).
- Orchestration: Significant experience deploying and managing complex workloads within Kubernetes (GKE).
- Professional familiarity with NVIDIA GPU generations and the intricacies of high-performance compute.
- Strong programming skills and a "reliability-first" approach to software development.
Nice to Have:
- A career history that spans both ML Software Engineering and Infrastructure SRE roles.
- Experience leading multi-disciplinary projects and navigating complex stakeholder requirements in a fast-paced environment.
- Familiarity with workload scheduling, ML efficiency research, and hardware benchmarking.
- Experience with Google TPU generations and specialized ML-driven R&D cycles.
Culture and values
We are guided by our shared values. It's not about finding people who think and act in the same way. These values help to guide our work and will continue to strengthen it.
Thoughtful
Thoughtful at Iso is about curiosity, creativity and care. It is about good people doing good, rigorous and future-making science every single day.
Brave
Brave at Iso is about fearlessness, but it’s also about initiative and integrity. The scale of the challenge demands nothing less.
Determined
Determined at Iso is the way we pursue our goal. It’s a confidence in our hypothesis, as well as the urgency and agility needed to deliver on it. Because disease won’t wait, so neither should we.
Together
Together at Iso is about connection, collaboration across fields and catalytic relationships. It’s knowing that transformation is a group project, and remembering that what we’re doing will have a real impact on real people everywhere.
Creating an extraordinary company
We believe that to be successful we need a team with a range of skills and talents. We're building an environment where collaboration is fundamental, learning is shared and every employee feels supported and able to thrive. We value unique experiences, knowledge, backgrounds, and perspectives, and harness these qualities to create extraordinary impact.
We are committed to equal employment opportunities regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy or related condition (including breastfeeding) or any other basis protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.
Hybrid working
It’s hugely important for us to share knowledge and build strong relationships with each other, and we find it easier to do this if we spend time together in person. This is why we follow a hybrid model, and would require you to be able to come into the office 3 days a week (currently Tuesday, Wednesday, and one other day depending on which team you’re in). If you have additional needs that would prevent you from following this hybrid approach, we’d be happy to talk through these if you’re selected for an initial screening call.
Please note that when you submit an application, your data will be processed in line with our privacy policy.
Create a Job Alert
Interested in building your career at Isomorphic Labs? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
.png?1697105647)