Back to jobs
New

Senior/Staff Scientist, Data Science

Berkeley, CA

About Glyphic:

At Glyphic Biotechnologies, we plan to create the protein revolution for which scientists and researchers have been waiting. We are developing a massively parallel, single-molecule proteome sequencing platform that will transform life science discovery and usher in a new era of insights into human biology and disease. To date, we have raised >$50M from venture partners and non-dilutive grant funding to achieve our vision of next generation proteome sequencing.

What we are looking for in you

Glyphic is seeking a highly motivated and experienced Senior/Staff Data Scientist to assist in the advancement of our cutting-edge single molecule proteome sequencing platform which has the potential to transform how we understand biology and develop new medicines.

We're looking for a Senior Data Scientist who's excited about solving complex, real-world problems with cutting-edge technology. You'll work directly with our CTO and a collaborative team of scientists, engineers, and bioinformaticians who are passionate about pushing the boundaries of what's possible.

This is a CA-based hybrid role and you'll spend ~20% of your time on-site with the team in Berkeley, CA (on average), with flexibility for additional collaboration as projects require.

What you’ll do

Data Analysis and Insight Generation:

  • Design and implement novel algorithms to analyze proteomics data that no one has ever seen before.
  • Develop machine learning models that can extract meaningful insights from complex, noisy biological signals.
  • Develop and optimize algorithms for analyzing high-dimensional chemistry and NGS data, including single cell, spatial data,  and LCMS data outputs
  • Build models that reveal how parameters and molecular interfaces drive outcomes, including surface interactions and molecule-target binding.
  • Design and execute biostatistical analyses using Python and/or R to uncover significant trends, model experimental outcomes, and inform data-driven decision-making.
  • Apply machine learning to guide experiment design, identify key parameters, and optimize workflows for efficiency and reproducibility.
  • Develop clear, insightful visualizations that make complex, high-dimensional results understandable and actionable for scientists and stakeholders.
  • Help define metrics and visualizations that clarify high-dimensional relationships for scientists and stakeholders.
  • Partner with wet lab, hardware, and software teams to translate experimental goals into computational strategies.

Pipelines and Automation:

  • Create ETL pipelines that clean, normalize, and integrate diverse datasets (sequencing reads, LCMS spectra, metadata) into analysis-ready formats.
  • Combine off-the-shelf pipelines (basecalling, variant calling, deconvolution) with custom scripts to deliver end-to-end solutions.
  • Continuously improve throughput and data quality by automating QC steps and integrating feedback from experiments.
  • Establish best practices for code quality, testing, and deployment that will scale with our growing team.

What you need

Required:

  • PhD in Computer Science, Bioinformatics, Computational Biology, Biostatistics or related field with 4+ (Senior) or 6+ (Staff) years of hands-on experience.
  • Proven ability to model and interpret high-dimensional datasets with numerous interacting variables, uncovering statistically robust patterns and causal relationships.
  • Competency in chemistry data science (e.g., interpreting LCMS data, utilizing deconvolution tools, understanding surface chemistry and molecule-target interactions).
  • Competency in next generation sequencing, including familiarity with multi-omics, error modeling, and basecalling.
  • Expertise in Python and/or R for biostatistical analysis, including data wrangling, statistical modeling, and visualization of high-dimensional experimental results.
  • Experience designing ML models for experimental data and deploying pipelines (Snakemake, Nextflow).
  • Familiarity with ML frameworks (PyTorch, TensorFlow) and data science libraries (pandas, numpy, scipy).
  • Experience building automated data pipelines and infrastructure for scalable analysis (cloud, Docker/Kubernetes).
  • Experience with cloud platforms (AWS, GCP, or Azure) and containerization tools (Docker, Kubernetes).
  • Proficiency with data visualization tools (matplotlib, seaborn, plotly) and Jupyter notebooks.
  • Familiarity with version control (git) and pipeline workflow systems (Snakemake, Nextflow, etc.)

Nice to have:

  • Ability to work in performant languages  (C++, Rust, Julia, or CUDA).
  • Ability to develop solutions that optimize the utilization of large-scale data storage, cloud processing infrastructure, and distributed computing.
  • Direct proteomics experience (mass spectrometry, multiplex assays, etc.).
  • Deep learning experience with time-series data, signal processing, or sequence modeling.
  • Ability to build and deploy scalable ML pipelines using PyTorch/TensorFlow for real-time protein sequence analysis.
  • Experience with MLOps tools and practices for model deployment and monitoring.
  • Experience building commercially successful life science tools that other scientists actually use and love.
  • Previous startup or fast-paced industry (e.g., skunkworks) experience.

We’re looking for teammates with:

  • Excellent interpersonal skills – capable of building strong relationships and communicating effectively with stakeholders at all levels.
  • High emotional and analytical intelligence – able to navigate complex team dynamics, partnerships, and challenges with creativity and logic.
  • Resourceful adaptability – operates with urgency, remains flexible in evolving environments, and thrives in ambiguity.
  • Collaborative spirit – enjoys working across disciplines and explaining complex concepts to diverse audiences.

What you can expect

Work environment:

  • Flexible hybrid schedule with quarterly team gatherings
  • Access to cutting-edge technology and computational resources
  • Collaborative culture where your ideas and expertise are valued
  • Direct impact on product development and company direction

Professional growth:

  • Work on problems that don't have solutions in textbooks
  • Your algorithms will directly influence experimental design and product development
  • Debug and optimize real experimental results, not just theoretical datasets
  • Bridge the gap between cutting-edge research and practical applications
  • Learn from a diverse team of world-class scientists and engineers
  • Contribute to first-of-their kind technologies, high-impact publications, and patents

Compensation

Estimated Base Salary $168,000 - 238,000

This is the pay range for this position that we reasonably expect to pay. Individual compensation is based on various factors including, experience, education, skillset, and geographic location. This range is for the SF Bay Area, California location and may be adjusted to the labor market in other geographic areas.

Benefits and Perks:

  • Employee Stock Option Plan
  • 100% Health Plan Coverage for Employees & Dependents (Medical, Dental, & Vision)
  • Employer Retirement Contributions to 401(k)
  • Generous Paid Time Off
  • Paid Maternity and Paternity Leave
  • Health & Wellbeing Program
  • Office Snacks and Beverages
  • Regular Team Bonding Activities

We are an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Individuals seeking employment at Glyphic Biotechnologies are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...
Select...
Select...