
Data Engineer
About RTW Investments
RTW Investments, LP (“RTW”) is a global, full life-cycle investment and innovation firm dedicated to solving the most challenging, unmet patient needs. We are focused on company building and identifying transformational and disruptive innovations across the biopharmaceutical and medical technologies sectors. Our investment philosophy combines a deep understanding of disease, biology, medicine, and technology with a comprehensive view of commercial opportunities and challenges. Our talented team brings hands-on expertise in targeted areas of innovation, dedicated to the diligent exploration and support of emerging breakthroughs in both industry and academia.
Our mission is simple: we power breakthrough therapies that transform the lives of millions.
RTW Investments – Recent Press
Overview
RTW Investments is seeking a Data Engineer with a strong interest in Knowledge Graphs (KG) and solid fundamentals in ETL. You’ll help design and maintain lightweight ontologies and schemas, build reliable data pipelines in Databricks on Azure, and support graph-backed use cases (entity linking, relationship modeling, semantic search). This role is ideal for someone early in their career who enjoys structured data modeling, writing clean Python/SQL, and collaborating with senior engineers and domain experts.
This role is a unique opportunity to become an important part of the team at RTW.
Key Responsibilities:
- Implement and maintain basic ETL/ELT pipelines on Databricks (PySpark, SQL, Delta Lake) to ingest, transform, and publish curated datasets.
- Contribute to KG modeling: draft and extend ontologies/taxonomies, define schemas (entities, relationships, properties), and document naming conventions.
- Build “graph ETL” flows to load nodes/edges into a KG tool (e.g., Stardog or Neo4j) from tabular sources (CSV, Delta tables), including upsert logic and basic data quality checks.
- Author queries over the graph (e.g., Cypher or SPARQL) to validate relationships and support downstream analytics.
- Collaborate with data scientists/analysts to understand entity definitions, resolve identity (de-duplication, matching), and map source systems to the KG.
- Maintain reproducible, version-controlled jobs (Git) and contribute to simple CI checks (lint, tests).
- Write clear technical docs (schemas, lineage notes, how-to run jobs) and contribute to team knowledge base.
- Follow security and governance basics on Azure (e.g., Key Vault for secrets; proper access to ADLS Gen2).
Required Qualifications:
- 2–3 years of experience in data engineering, analytics engineering, or similar (internships/co-ops count).
- Proficiency in Python and SQL; comfort with PySpark for distributed transforms.
- Hands-on experience with Databricks (notebooks, jobs/workflows) and Delta Lake fundamentals.
- Working knowledge of Azure data services (at least ADLS Gen2 and Key Vault).
- Foundational KG concepts: nodes/edges/properties, ontologies/taxonomies, schemas; ability to explain how a table maps to a graph model.
- Exposure to at least one KG tool or language (e.g., Neo4j/Cypher, RDF/OWL, SPARQL)—academic or project experience is acceptable.
- Strong attention to detail, documentation habits, and version control (Git).
Nice-to-Have Skills:
- Neo4j ecosystem (Neo4j Desktop, Aura, APOC, py2neo, others) or Stardog or Azure/AWS managed graph services.
- RDF/OWL, SHACL for schema/constraint validation, or GraphQL for serving graph data.
- Basic data quality frameworks (expectations, schema checks) and lineage tools.
- Azure Databricks Workflows, Unity Catalog basics, or orchestration familiarity (ADF/Airflow).
- Simple containerization (Docker), Terraform, and CI/CD exposure (GitHub Actions/Azure DevOps).
- Domain modeling experience (designing entity/relationship diagrams) in any industry.
What You’ll Work With (Tech Stack):
- Databricks (PySpark, SQL, Delta Lake, Workflows)
- Azure (ADLS Gen2, Key Vault; plus RBAC fundamentals)
- Python (pandas, PySpark)
- Knowledge Graph tools (Stardog, Neo4J; Or other)
- Git/GitHub (branching, PRs, code reviews)
Success in This Role (First 90 Days):
- Ship: Implement a small but production-ready pipeline in Databricks that lands curated data to Delta with basic quality checks.
- Model: Propose and document a simple ontology/schema for one business domain and load a working slice into a KG tool.
- Query: Demonstrate useful Cypher/SPARQL queries that validate relationships and answer a stakeholder question.
- Document: Produce clear runbooks and schema docs that others can follow.
What We Value:
- Curiosity about graph modeling and how semantics improve analytics.
- Pragmatism—start simple, iterate, measure.
- Clear communication, code readability, and consistent documentation.
- Ownership and a growth mindset; you seek feedback and improve quickly.
Create a Job Alert
Interested in building your career at RTW Investments? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field