.png?1762423916)
Staff Software Engineer, Data Management Team
Biohub is leading the new era of AI-powered biology to cure or prevent disease through its 501c3 medical research organization, with the support of the Chan Zuckerberg Initiative.
The Team
Biohub supports the science and technology that will make it possible to help scientists cure, prevent, or manage all diseases by the end of this century. While this may seem like an audacious goal, in the last 100 years, biomedical science has made tremendous strides in understanding biological systems, advancing human health, and treating disease.
Achieving our mission will only be possible if scientists are able to better understand human biology. To that end, we have identified four grand challenges that will unlock the mysteries of the cell and how cells interact within systems — paving the way for new discoveries that will change medicine in the decades that follow:
- Building an AI-based virtual cell model to predict and understand cellular behavior
- Developing novel imaging technologies to map, measure and model complex biological systems
- Creating new tools for sensing and directly measuring inflammation within tissues in real time.tissues to better understand inflammation, a key driver of many diseases
- Harnessing the immune system for early detection, prevention, and treatment of disease
The Opportunity
The Data Management Engineering team manages and delivers APIs for scientific datasets specifically designed to enable biological modeling. It is responsible for schema, management, storage, retrieval, and usability. We handle over 89 million unique cells worth of single-cell transcriptomic data, as well as over 15,000 cryoET tomograms that are part of imaging datasets as large as 20TB and counting, along with additional imaging, perturbation, and sequencing modalities. Our resources provide access to open-source data that is structured and used by tens of thousands of scientists each month to quickly query and form hypotheses on understanding how genetic variants in cells impact disease risk, define drug toxicities, and ultimately discover better therapies.
As a Staff Software Engineer on the Data Engineering team, you will design and implement all the data management and access needs for our platforms, CELLxGENE Discover, CryoET, VCP, as well as our Grand Challenges, in order to enable scientists to further interrogate our very large and growing corpus of data without any need to download the data itself or have any computational expertise. You will work on a collaborative, multidisciplinary team to develop solutions that accelerate our scientist users' workflows and enhance the pace of scientific discovery and model development. You will be responsible for setting the direction of how our teams register, schematize, validate, store, monitor, and utilize petabytes of data for ease of use, search, and modeling. You will also be responsible for upskilling the engineers around you and influencing the adoption of proper technical best practices and data design for efficient and effective delivery.
No prior biology experience is required for this role. You will have the opportunity to pair with Computational Biologists to develop solutions for our users and learn about biology from experts on our team.
Our tech stack includes Python, Terraform, OpenMetadata, Elasticsearch, AWS infrastructure, and TileDB.
What You'll Do
- Own, maintain, and continuously improve upon the data management architecture.
- Implement scalable data warehousing solutions to handle massive volumes of single-cell transcriptomics data and imaging data.
- Ensure data security and compliance with industry standards and regulations.
- Implement optimization strategies such as data partitioning, indexing, and compression to enhance query performance and reduce computational costs.
- Create user-friendly APIs, CLIs, and libraries to enable researchers and scientists to easily access and explore the curated data.
- Develop scalable, maintainable, and testable software systems and participate in team conversations and efforts on engineering excellence.
- Collaborate with product managers, computational biologists, UX designers, and other software engineers to deliver constant incremental value for scientists without compromising on software quality.
- Have opportunities to learn about scientific data and technologies, though no prior experience is required!
What You'll Bring
- 8+ years of relevant software experience
- Strong fundamentals in systems design, data structures, algorithms, and object-oriented programming principles.
- Past experience with data processing and orchestration pipelines, such as Argo Workflows, Databricks
- Past experience with managing different tiers of data and large-scale data
- Solid experience with object-oriented programming languages and scripting languages, such as Java, C++, Python, Golang, etc.
- Past experience with big data.
- Experience with infrastructure and automation tools, including Kubernetes, Terraform, and AWS.
- Excellent written and verbal communication skills.
- Enthusiasm to ramp up on technologies and learn a new science domain.
- Experience working in a multidisciplinary environment (engineering, product, design).
- Desirable but not required: experience with scientific computing libraries, such as NumPy and SciPy.
Compensation
The Redwood City, CA base pay range for a new hire in this role is $214,000 - $294,800. New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process.
[Include for Remote Exceptions] Pay ranges outside [Redwood City, CA New York City, NY Chicago, IL San Francisco, CA] are adjusted based on cost of labor in each respective geographical market. Your recruiter can share more about the specific pay range for your location during the hiring process.
Better Together
As we grow, we’re excited to strengthen in-person connections and cultivate a collaborative, team-oriented environment. This role is a hybrid position requiring you to be onsite for at least 60% of the working month, approximately 3 days a week, with specific in-office days determined by the team’s manager. The exact schedule will be at the hiring manager's discretion and communicated during the interview process.
Benefits for the Whole You
We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible.
- Provides a generous employer match on employee 401(k) contributions to support planning for the future.
- Paid time off to volunteer at an organization of your choice.
- Funding for select family-forming benefits.
- Relocation support for employees who need assistance moving
If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.
#LI-Hybrid #LI-Onsite
Apply for this job
*
indicates a required field