Data Engineer
Imagine if you had the skills, knowledge, and teammates to both understand the root of the world’s most pressing problems and build the technologies and companies best positioned to solve them. RA Capital has done exactly that for more than two decades, backing bold ideas in medicines to further human health and now expanding into Planetary Health to improve how efficiently we utilize the world’s precious resources.
RA Capital is among the leading providers of capital and services to the most promising innovators in the world. We invest flexibly—seed to IPO and beyond, anywhere in the world—with $10 B+ under management and a culture that prizes curiosity, rigor, and collaborative debate. We are investors who not only fund companies but get elbow deep in building them. From helping them recruit talent to helping them recruit patients for their studies to helping match them to strategic partners and even going to Washington to win reforms, RA Capital’s large team has people with nearly every relevant expertise one might need to turn an idea into a cure that actually helps people.
If you live for first-principles problem-solving with great colleagues, thrive on complexity, and want to do meaningful work that ripples across industries and ecosystems, you’ll feel at home at RA Capital. Here, questions are welcomed, ideas are tested, and victories are shared. Even our lawyers are creative and engaging. And don’t get us started on our compliance team’s wicked sense of humor; nothing about what we do is boring.
Are you ready to bring your creativity, discipline and collaborative spirit to help us invent the future? Join us and you’ll collaborate daily with investors, founders, physicians, biologists, engineers, economists, and reform advocates who think in systems and act with urgency.
Join us to invent a happier, healthier, more productive future - and have fun doing it.
About the Team
RA Capital’s Data Engineering team is responsible for ensuring high-quality, reliable, and accessible data throughout the organization. We emphasize data integrity, compliance, and usability to support strategic decision-making across RA Capital. Our team oversees the complete data lifecycle—partnering with internal stakeholders and external vendors—to build scalable data infrastructure that fuels a data-driven culture.
About the Role
We are seeking a skilled Data Engineer with data experience and a strong interest in AI/LLM-powered data access to join our Data Engineering team. This role is pivotal in designing and maintaining robust data pipelines and extending that data accessibility through AI-driven solutions.
The ideal candidate will possess deep technical knowledge in data engineering and a working understanding of large language model (LLM) systems and the Model Context Protocol (MCP). You’ll help bridge structured enterprise data with AI interfaces that power self-service and natural language query workflows.
Responsibilities
- Design, build, and optimize end-to-end enterprise data pipelines for ingesting and integrating vendor data
- Develop and maintain robust ETL processes and data integrations between data warehouses (e.g., Databricks) and downstream applications.
- Write production-level Python and SQL code to standardize, reconcile, and match healthcare data, applying NLP and ML techniques when needed.
- Develop scalable data models in Databricks to support efficient reporting and analytics across clinical, financial, and operational datasets.
- Implement rigorous data quality controls and validation checks to ensure data accuracy and compliance
- Collaborate with external data vendors to define delivery specifications and transformation logic.
- Partner with internal IT, analytics, and business stakeholders to align data efforts with organizational objectives.
- Work closely with AI/ML engineers and product teams to support LLM-based data access layers above Hasura or similar GraphQL engines.
- Contribute to the integration and evaluation of Model Context Protocol (MCP) in real-world applications, enabling scalable, secure, and interpretable LLM usage.
- Document data architectures, pipelines, workflows, and processes for both technical and non-technical audiences.
- Provide Tier 1 support for monitoring data flows and resolving pipeline or integration issues.
- Ensure ongoing compliance with data governance and security standards.
Key Skills & Experience
- 1–2+ years experience in a data engineering role
- Expertise in building scalable ETL/ELT pipelines and data integration workflows.
- Strong skills in Python, SQL, and Spark. Experience with Java is a plus.
- Hands-on experience with Databricks; familiarity with AWS (S3, EC2, EBS) preferred.
- Strong understanding of data validation, quality assurance, and compliance practices
- Exposure to LLM applications and AI-driven data interfaces, particularly in structured enterprise data environments.
- Familiarity with Model Context Protocol (MCP) and how it supports contextual integrity, auditability, and chain-of-thought in AI/LLM-based data access.
- Proven ability to manage external data vendors and collaborate on schema, format, and delivery improvements.
- Ability to clearly convey technical details to non-technical stakeholders and align data projects with business needs.
Key Requirements
- Master’s degree or higher from a top Computer Science or Data Science program.
- 1–2+ years of experience in data engineering, software development, and managing production-grade pipelines
- Must be based in Boston area
- Ability to work a hybrid schedule in our Boston office
- Must be authorized to work in the United States.
Apply for this job
*
indicates a required field