Back to jobs
New

Senior Bioinformatician – Genomics Data Infrastructure

United States

About Violet Research Institute

Violet Research Institute (VRI) is building the future of personalized medicine for patients with genetic diseases. We're at the frontier of a new era in medicine — one where treatments can be designed for individual patients based on their unique biology. Recent breakthroughs in science, engineering, and regulatory pathways have made this possible, but much of this work remains nascent and distributed across disparate efforts globally. We're unifying, refining, and scaling these efforts into a cohesive platform. For each patient we serve, we deeply understand their biology, then design and manufacture a targeted treatment that can be delivered in months instead of years.

We combine the urgency and execution mindset of a startup with the mission-driven openness of a nonprofit, allowing us to collaborate broadly and move quickly on behalf of the patients we serve. We've brought together leading researchers, engineers, and organizations across omics, therapeutic design, manufacturing, clinical care, and AI to move from insight to action as quickly as possible.

VRI is founded by the family of our first patient, Violet, and is led by Michael Buckley, Siranush Babakhanova and Steve Turner. Our team is deeply cross-disciplinary and first-principles driven. We value builders, experts, and generalists who are excited to work across domains, challenge conventional approaches, and increase access to personalized medicine.

Location: San Francisco Bay Area (or Remote, US-based preferred)

Type: Full-Time

Compensation: $175k - $225k

About Violet Research Institute

Violet Research Institute (VRI) is building the future of personalized medicine for patients with genetic diseases. We are at the frontier of a new era in medicine — one where treatments can be designed for individual patients based on their unique biology. Recent breakthroughs in science, engineering, and regulatory pathways have made this possible, but much of this work remains nascent and distributed across disparate efforts globally. We are unifying, refining, and scaling these efforts into a cohesive platform. For each patient we serve, we deeply understand their biology, then design and manufacture a targeted treatment that can be delivered in months instead of years.

We combine the urgency and execution mindset of a startup with the mission-driven openness of a nonprofit, allowing us to collaborate broadly and move quickly on behalf of the patients we serve. We have brought together leading researchers, engineers, and organizations across omics, therapeutic design, manufacturing, clinical care, and AI to move from insight to action as quickly as possible.

VRI is founded by the family of our first patient, Violet, and is led by Michael Buckley, Siranush Babakhanova, and Steve Turner. Our team is deeply cross-disciplinary and first-principles driven. We value builders, experts, and generalists who are excited to work across domains, challenge conventional approaches, and increase access to personalized medicine.

 

The Role

As our founding Bioinformatician, you will be the architect and owner of VRI’s entire genomics data foundation. This is not an analyst role embedded in someone else’s pipeline. You will design, build, and steward the systems that collect, unify, quality-control, and surface all genetic, sequencing, and assay data across the organization, replacing fragmented, ad-hoc processes with a rigorous, reproducible, and scalable data layer.

Your work begins with PacBio long-read whole-genome sequencing and multiomics integration, and grows into a platform that can onboard new patients and indications with speed and consistency. You will partner closely with computational biologists and clinical scientists to make data trustworthy and analysis-ready, enabling the fast and accurate clinical interpretations being made by experts you will partner with. Your job is to make their work faster, more reliable, and fully reproducible.

You will also be responsible for and own the data infrastructure, from architecting the integrations between the various systems, the data ontology and structure, and everything else needed to ensure clean data and reproducible processes and analysis.The data infrastructure you create will have a direct, near-term impact on real patients’ lives.

 

What You’ll Own

Genomics Data Management & Stewardship

  • Own the full lifecycle of VRI’s genomics data, from raw sequencer output (FASTQ, BAM/CRAM, VCF) through QC, storage, versioning, and retrieval,as the single accountable person for data integrity across all datasets
  • Define and enforce data standards, naming conventions, metadata schemas, and ontologies for all data types: sequences, variant calls, splicing data, and experimental assay results
  • Build and maintain a centralized, queryable genomics data lake that unifies heterogeneous inputs from internal labs and CRO partners (US, Israel, China) into a single, analysis-ready data model
  • Establish sample tracking, data lineage documentation, and versioning protocols so every result is traceable back to its source
  • Manage cloud storage strategy (AWS S3 or GCP) across hot, warm, and cold tiers;balancing cost, accessibility, and HIPAA-compliant security
  • Create and maintain an internal data catalog documenting all datasets, pipeline versions, and transformation logic so any scientist can understand what data exists and how it was produced

Pipeline Development & Data Engineering

  • Design and build production-grade, reusable pipelines for ingesting and processing PacBio long-read WGS data, including phased genome assembly, structural variant calling, and SNP/indel detection
  • Build ETL workflows that clean, normalize, and integrate diverse data modalities (sequencing reads, RNA/splicing data, and assay metadata)into unified, analysis-ready formats
  • Automate QC steps to surface data anomalies early; monitor data quality continuously across sequencing batches and CRO handoffs
  • Establish code quality standards, testing protocols, and deployment practices (version control, containerization) that will scale as the team grows
  • Maintain and develop internal database systems, including our proprietary VRI OS platform used for experiment tracking — contributing to data integrity, system upkeep, and building custom tools and interfaces to support research workflows.
  • Integrate physics-based thermodynamic models and predictive algorithms to forecast therapeutic performance and guide design decisions
  • Develop and apply design criteria and ranking systems to evaluate therapeutic candidates computationally before advancing to wet lab testing
  • Build and maintain algorithms that bridge computational predictions with experimental validation, optimizing the design-to-testing pipeline.

Multiomics Integration

  • Integrate multi-layered genomics data (DNA, RNA-seq, long-read RNA, splicing) with proteomics, metabolomics, and mass spectrometry data (LC-MS, MS/MS) into coherent, patient-centric multiomics datasets
  • Query and harmonize large-scale population cohorts (UK Biobank, Mount Sinai Million, and similar) to contextualize patient findings
  • Partner with computational biologists and clinical scientists to surface analysis-ready datasets, enabling and supporting their interpretation work

Insight Delivery & Reporting

  • Build automated reporting pipelines that push structured summaries of data quality, pipeline status, and batch results to scientific stakeholders, thereby replacing manual handoffs
  • Develop QC dashboards to surface data quality metrics, pipeline status, and anomaly alerts in real time
  • Directly support IND filings through preparation of relevant datasets and written reports/descriptions.

Continuous Improvement

  • Actively monitor the bioinformatics landscape — using AI-assisted tools where applicable — to identify emerging algorithms and platforms that can sharpen VRI’s data infrastructure
  • Lay the foundation for future bioinformatics hires by embedding well-documented, reproducible data practices from day one

Requirements

Must Have

  • 4+ years of hands-on bioinformatics experience in a research or biotech environment, with a strong focus on genomics data management and pipeline engineering
  • Proven experience owning genomics data end-to-end — not just running analyses, but building the systems and standards that make data trustworthy and reusable
  • Strong fluency in genomics file formats and toolchains: FASTQ, BAM/CRAM, VCF, BED; variant callers (GATK, DeepVariant, PBSV); assembly tools (hifiasm or equivalent)
  • Demonstrated experience with PacBio long-read WGS data and associated long-read tooling
  • Proficiency in Python; experience building and maintaining production-grade pipelines with workflow managers (Nextflow, Snakemake, or WDL)
  • Hands-on experience with cloud data infrastructure: AWS S3 or GCP, data lake design, pipeline orchestration, and HIPAA-compliant storage
  • Experience querying and integrating biobank-scale datasets (UK Biobank or similar)
  • Strong organizational skills — you naturally document your work, build systems others can use, and take ownership of data quality without being asked

Preferred

  • Experience with RNA-seq and long-read RNA analysis, including pre-mRNA processing and splicing characterization
  • Familiarity with LIMS systems (Benchling, LabVantage, or similar) and data governance / FAIR data frameworks
  • Familiarity with containerization tools (Docker, Singularity) and CI/CD practices
  • Exposure to siRNA, ASO, or other therapeutic modality-specific bioinformatics
  • Experience in a seed or early-stage biotech; comfort building infrastructure from scratch

Behavioral Essentials

  • Execute independently from loosely specified tasks, you are self-directing
  • Ask for help only when truly blocked, communicating` clearly what is needed and what you have already tried
  • Thrive in early-stage, ambiguous, high-pace environments where the path is built as you walk it
  • Mission-driven with genuine, active care for patient impact (a daily operating principle at VRI)

AI, Tools & Operating Environment

At VRI we genuinely embrace AI at every step of the process. Claude and other AI tools are used throughout the day, across every function. Computational fluency and comfort with AI-assisted analysis and literature synthesis are expected. If you treat AI as a novelty or an occasional aid, this is not the right environment.

How We Hire

We are looking to hire immediately and are moving quickly. Our anticipated process can take as little as 5 days: Apply → Initial Recruiter Call → Hiring Manager Interview → Technical Stakeholder Interview → Executive Director Interview → Offer.

Compensation & Benefits

VRI provides competitive compensation based upon experience, qualifications, and role scope, starting at $175k. We also offer a full suite of benefits.

 

Apply for this job

*

indicates a required field

Phone
Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter*

Accepted file types: pdf, doc, docx, txt, rtf


Education

Select...
Select...

Example: City, State/Province, Country

What was the goal, what did you personally do, and what was the outcome?

Select...
Select...

We're a visual team and love to see what you've built! If you have a personal website, portfolio, or slidedeck showcasing your work please add it here.

Personal Portfolio

Accepted file types: pdf, doc, docx, txt, rtf

We're a visual team and love to see what you've built! If you have a personal website, portfolio, or slidedeck showcasing your work please upload it here. (Optional)