Principal Data Scientist, Science

Chicago (Flex)

The Chan Zuckerberg Initiative was founded by Priscilla Chan and Mark Zuckerberg in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education to addressing the needs of our local communities. Our mission is to build a more inclusive, just, and healthy future for everyone.

The Team

CZI supports the science and technology that will make it possible to help scientists cure, prevent, or manage all diseases by the end of this century. While this may seem like an audacious goal, in the last 100 years, biomedical science has made tremendous strides in understanding biological systems, advancing human health, and treating disease. 

Achieving our mission will only be possible if scientists are able to better understand human biology. To that end, we have identified four grand challenges that will unlock the mysteries of the cell and how cells interact within systems — paving the way for new discoveries that will change medicine in the decades that follow:

  • Building an AI-based virtual cell model to predict and understand cellular behavior
  • Developing state-of-the-art imaging systems to observe living cells in action
  • Instrumenting tissues to better understand inflammation, a key driver of many diseases
  • Engineering and harnessing the immune system for early detection, prevention, and treatment of disease

CZI’s work in science includes grantmaking programs, open-source software development, and close collaboration with the Chan Zuckerberg Biohub Network. The CZ Biohub Network includes the San Francisco, Chicago, and New York Biohubs as well as the Chan Zuckerberg Imaging Institute. CZI also collaborates with institutional partners like the Kempner Institute for the Study of Natural & Artificial Intelligence at Harvard University. Join us in accelerating science.

The Opportunity

The Principal Data Scientist will lead a team that will define and create the datasets and covariates required to train a Virtual Cell model that understands how each cell type functions at a molecular level under healthy conditions, and is able to predict the impact on cells and cell populations of genetic or environmental perturbations. You and your team will partner closely with our Science Program team to partner or generate the required datasets, with Data Engineering and ML Engineering to decide data formats, schemas, and access patterns, and with AI Research to define the annotations, covariates, and quality measures required to use the data. 

CZI manages and processes scientific datasets specifically designed to enable biological modeling. We handle over 100 million fully standardized unique cells worth of single cell transcriptomic data, over 15 thousand cryoET tomograms that are in imaging datasets as large as 20TB and counting. This year, we are expanding data operations by > 10x to support a higher volume of imaging, sequencing, literature, and mass spectrometry datasets. These data are available via public resources, CELLxGENE Discover and CryoET Portal. Our resources provide access to open source data that is structured and used by tens of thousands of scientists each month to quickly query and form hypotheses on understanding how genetic variants in cells impact disease risk, define drug toxicities, and eventually discover better therapies.

As the Principal Data Scientist, you will manage a team of data scientists who are embedded in cross-functional, AI-modeling focused teams and are responsible for leading dataset definition and delivery to support modeling work. Success is measured by the speed with which datasets are available, the ease with which they can be used, the suitability of the dataset to address the modeling task, and the clarity with which data quality is available to the research team. As a manager of a small team, this role will be a player/coach position. Your individual contributions will focus on defining a strategic, integrated dataset design sufficient to realize a Virtual Cell Model, and to ensure that the individual datasets your team creates for each modeling project each create measurable progress towards the integrated design. 

What You'll Do

  • Manage a team of data scientists to create high quality, standardized datasets for biological AI modeling and open-source publishing.
  • Establish systems and processes to ensure agility and responsiveness to emerging needs from the rapidly evolving AI biology space.
  • Define a data vision for the Virtual Cell Model, build buy-in with cross-functional partners, and partner to see it incrementally realized as we iterate towards our vision.
  • Discover and define new data generation opportunities, and manage the delivery of those data products to our AI team.
  • Collaborate with ML engineers, AI Researchers, and Data Engineers to deliver datasets to train state of the art generative AI models.
  • Collaborate with engineers, product managers, UX designers, and other data scientists to publish valuable datasets to accelerate community modeling progress.

What You'll Bring

  • 10+ years of experience with biological data, including experience with both imaging and sequencing data modalities.
  • Have delivered multiple large data products, at least one of which contains a significant component of sequencing and imaging data.
  • 5+ years managing teams, including management of hybrid/remote teams.
  • Experience with big data: extraction, transport, loading, databases, standardization, and data validation.
  • Strong fundamentals in statistical reasoning and machine learning.
  • Experience with image and sequence data analysis and QC best practices
  • Experience with processing and orchestration pipelines, such as Argo Workflows, Databricks
  • Excellent written and verbal communication skills.
  • Enthusiasm to ramp up on technologies and learn new domains.
  • Experience working in a multidisciplinary environment (engineering, product, design).

Compensation

The Chicago, Illinois base pay range for this role is $204,850 - $307,700. New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process. 

Pay ranges outside Redwood City are adjusted based on cost of labor in each respective geographical market. 

Benefits for the Whole You 

We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible. 

  • CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
  • Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
  • CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
  • Paid time off to volunteer at an organization of your choice. 
  • Funding for select family-forming benefits. 
  • Relocation support for employees who need assistance moving to the Bay Area
  • And more!

If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.

Explore our work modesbenefits, and interview process at www.chanzuckerberg.com/careers.

 #LI-Hybrid 

    Facebook Instagram Medium Linkedin X YouTube    
 

Apply for this job

*

indicates a required field

Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Read more about our Work Mode options.
Select...
Select...
Select...
Select...
Have we met you at one of our events? If so, which one(s)? *
Select...
Select...
Select...
Select...

Reasonable Accommodation Notice
CZI provides (and state and federal law requires) reasonable accommodations to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job or to perform your job (reach out to your recruiter or accommodations@chanzuckerberg.com). Examples of reasonable accommodation include making a change to the application process or work procedures, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment.

Applicant Privacy Notice
To learn more about how we use the information you submit, please see our Privacy Notice for Job Applicants.

Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in Chan Zuckerberg Initiative’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Select...
Select...
Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Select...

Voluntary Self-Identification of Disability

Form CC-305
Page 1 of 1
OMB Control Number 1250-0005
Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

  • Alcohol or other substance use disorder (not currently using drugs illegally)
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
  • Blind or low vision
  • Cancer (past or present)
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or serious difficulty hearing
  • Diabetes
  • Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
  • Epilepsy or other seizure disorder
  • Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
  • Intellectual or developmental disability
  • Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
  • Missing limbs or partially missing limbs
  • Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
  • Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS)
  • Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
  • Partial or complete paralysis (any cause)
  • Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
  • Short stature (dwarfism)
  • Traumatic brain injury
Select...

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.