Principal ML Research Engineer
Your Impact at LILA
Lila is building a platform where AI and automation co-evolve to solve the hardest problems in science. Within Life Science AI (LSAI), we are launching a new AI for Cell Biology team to develop autonomous-science capabilities for cellular and tissue biology, spanning single-cell omics, perturbation biology, spatial profiling, imaging, genetics, and multi-modal experimental data; that integrate deep biological expertise with foundation modeling and agentic systems.
We are seeking a Principal ML Research Engineer to be the founding engineering leader on this team. This is a 0→1 hands-on role. You will build and operate the engineering platform: domain data, domain-specific models, shared specialist-model serving and inference, agentic infrastructure, and the evaluation harness that the team's research programs run on, and that integrates cell-biology research with Lila's central autonomous-science platform: it's core-model, agentic-systems, and experimental-automation infrastructure that closes the loop between AI reasoning and the lab. You will work closely with Lila's central AI Platform, Data Platform, and autonomous-lab engineering teams to leverage and extend core Lila infrastructure rather than rebuild it, and you will co-develop the technical direction of the team with the VP of AI for Cell Biology and its ML Scientists as you build.
Cell- and tissue-scale biology sits at an open frontier of AI for science. The field has produced strong specialist models across sub-domains: single-cell foundation models, molecular structural prediction, perturbation response, cellular imaging, pathway and ligand–receptor inference — but the engineering platform that makes these models reliably composable, the domain data that grounds them, and the evaluation that connects their outputs to autonomous experimentation are still being defined. We have a working point of view on what that platform looks like: domain-specific data curation and accessibility; fine-tuning and (where warranted) training of domain-specific models on cell- and tissue-resolution data; shared specialist-model serving; a unified reasoning-trace and tool-call schema; evaluation-harness instrumentation; and the agentic infrastructure for rollout generation, tool orchestration, and rubric grading that the team's research programs share — and you will refine, challenge, or replace it. The platform choices you make will shape what Lab-in-the-Loop autonomous science looks like at cell and tissue scale.
This is a senior IC role for someone who wants to build, with the engineering depth to ship the infrastructure that makes cell-biology research programs thrive and the judgment to co-author tech stack strategy with the team scientific leads as the platform takes shape.
What You'll Be Building
- Build and operate the domain data platform. Stand up the curation, accessibility, lineage, schema, and versioning infrastructure for the multi-modal scientific data the team's research programs depend on: single-cell, multi-omics, spatial, imaging, perturbation, and genetics. Make complex domain data discoverable and queryable for ML scientists and computational biologists. Steward the reasoning-trace and tool-call schema that lets domain data and downstream traces outlive any single program.
- Build and operate the shared specialist-model serving and fine-tuning stack. Serve the field's strongest specialist biology models: single-cell foundation models, structural prediction, perturbation, spatial, imaging, pathway and ligand–receptor models, as composable, versioned tools the team's research programs share. Build the infrastructure to fine-tune these specialists on cell- and tissue-resolution data, and (where warranted by evaluation evidence) to train new domain-specific models.
- Build and operate the shared agentic infrastructure. Stand up the rollout-generation, tool-orchestration, rubric-grading, and trace-QC agents that the team's research programs share. Set the standards by which agentic workflows are reproducible, observable, evaluable, and safe to scale.
- Build and operate the cross-program evaluation harness. Build the instrumentation that gauges progress across the team's research programs and that connects team-internal metrics to Lila's broader scientific evaluation suite. Benchmarks instrumented here outlive any single program and become part of Lila's standing scientific evaluation suite.
- Leverage and extend Lila's central AI Platform and Data Platform. Partner with Lila's central AI Platform, Data Platform, and autonomous-lab engineering teams to extend core Lila infrastructure for cell-biology-specific needs rather than rebuild it. Architect how cell-biology research feeds into and benefits from Lila's foundation-model, agentic-systems, and experimental-automation infrastructure, and how cell-biology research outputs flow back into Lila's broader autonomous-science capability. Push improvements back into the central platforms where the team's work generalizes.
- Co-develop the technical platform direction. Partner with the team scientific leads to shape the engineering architecture end-to-end: domain data, specialist-model serving, agentic infrastructure, evaluation, and integration into Lila's Lab-in-the-Loop autonomous-science lifecycle. This is shared authorship of platform strategy alongside hands-on building, not in place of it.
- Set engineering standards and grow the engineering bench. Set the team's engineering culture — code quality, reproducibility, deployment hygiene, observability, MLOps practice — and mentor research engineers across the team's research programs and applications horizontal. Externally, represent Lila's AI for Cell Biology platform engineering through open-source contributions, conference participation, and recruiting top-of-funnel.
What You'll Need to Succeed
- Education and experience. Advanced degree (MS or PhD) in Computer Science, Machine Learning, Engineering, or a related quantitative field, or equivalent industry track record, with 8+ years of ML platform, infrastructure, or research-engineering experience and a record of shipping and operating production ML systems.
- Hands-on ML platform building. Deep, hands-on experience building and operating ML platforms: model serving, fine-tuning and training pipelines, agentic orchestration, evaluation harnesses, data pipelines, that research teams build on. Comfortable both prototyping new infrastructure for the team and operating it at research scale, with central platform teams as the path to long-lived production services.
- Domain data engineering depth. Hands-on experience designing, operating, and making accessible data pipelines and data infrastructure for high-dimensional multi-modal scientific data at scale: single-cell, spatial, imaging, multi-omics, genetics, or comparable scientific imaging/sequencing modalities. Comfortable owning curation, lineage, schema, and discoverability for research teams.
- Specialist-model serving, fine-tuning, and integration. Hands-on experience serving heterogeneous specialist models (different frameworks, hardware profiles, inference patterns) as composable tools and fine-tuning or adapting them on domain-specific data. Comfort integrating multiple specialist models into end-to-end reasoning systems.
- Agentic and autonomous-science systems experience. Demonstrated work on agentic, active-learning, or closed-loop systems — particularly those that orchestrate scientific tools, plan or execute experiments, or reason over scientific processes, and ideally those coupled to automated or autonomous laboratory infrastructure.
- Modern ML systems tooling. Strong fluency in PyTorch (or JAX/TensorFlow); large-scale data loading; inference-time optimization; interactive scientific workflows; modern observability and deployment practice. Strong software engineering fundamentals: Python, containers, Kubernetes, CI/CD, infrastructure-as-code.
- Cross-functional collaboration. Strong track record of collaboration across ML scientists, computational biologists, experimental scientists, and central AI/ML, data, and platform-engineering teams. Bilingual translation between platform engineering and cell biology is a daily activity.
Bonus Points For
- Hands-on experience composing specialist biology models (e.g., single-cell foundation models, structural prediction models, perturbation models, spatial/imaging models, pathway and ligand–receptor models) into multi-step reasoning systems.
- Experience standing up shared model-serving stacks for heterogeneous specialist models with different frameworks, hardware profiles, and inference patterns.
- Experience building, fine-tuning, or contributing to domain-specific foundation models in biology, chemistry, or scientific imaging — useful where off-the-shelf specialists aren't good enough on cell- and tissue-resolution data.
- Experience designing data accessibility, discoverability, and lineage tooling for complex scientific datasets shared across multiple research programs.
- Experience operating against or contributing to shared central platform-engineering teams inside a large AI/ML organization.
- Experience with closed-loop or Lab-in-the-Loop workflows where computational predictions drive experimental decisions and experimental results feed back to retrain or retune models.
- Experience building or operating reasoning-trace, rubric-grading, or evaluation pipelines for agentic systems at scale.
- Experience with large-scale distributed training infrastructure (cloud or on-prem clusters) — useful but not required, since this role primarily consumes rather than operates such infrastructure.
- Open-source contributions to ML platform tooling, scientific computing frameworks, or biological modeling libraries.
- Prior experience as a founding engineer or technical lead on a new team.
About LILA
Lila Sciences is building Scientific Superintelligence™ to solve humankind's greatest challenges. We believe science is the most inspiring frontier for AI. Rather than hard-coding expert knowledge into tools, LILA builds systems that can learn for themselves.
LILA combines advanced AI models with proprietary AI Science Factory™ instruments into an operating system for science that executes the entire scientific method autonomously, accelerating discovery at unprecedented speed, scale, and impact across medicine, materials, and energy. Learn more at www.lila.ai.
Guided by our core values of truth, trust, curiosity, grit, and velocity, we move with startup speed while tackling problems of historic importance. If this sounds like an environment you'd love to work in, even if you don't meet every qualification listed above, we encourage you to apply.
We’re All In
Lila Sciences is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.
Information you provide during your application process will be handled in accordance with our Candidate Privacy Policy.
A Note to Agencies
Lila Sciences does not accept unsolicited resumes from any source other than candidates. The submission of unsolicited resumes by recruitment or staffing agencies to Lila Sciences or its employees is strictly prohibited unless contacted directly by Lila Science’s internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Lila Sciences, and Lila Sciences will not owe any referral or other fees with respect thereto.
Apply for this job
*
indicates a required field