Staff / Senior Staff Data Engineer, Real-World Data
About Us
Valo Health is a technology company that is integrating human-centric data and AI-powered technology to accelerate the creation of life-changing drugs for more patients faster. Valo was created with the belief that the drug discovery and development process can and should be faster and less expensive, with a much higher probability of success. We are using models early to fail less often, executing clinical trials to add valuation to the company, and generating fit-for-purpose data to feed back into Valo’s Opal Computational Platform™ as we reinvent drug discovery and development from the ground up. Disease doesn’t wait, so neither can we.
We are a multi-disciplinary team of experts in science, technology, and pharmaceuticals united in our mission to achieve better drugs for patients faster. Valo is committed to hiring diverse talent, prioritizing growth and development, fostering an inclusive environment, and creating opportunities to bring together a group of different experiences, backgrounds, and voices to work together. We achieve the widest-ranging impact when we leverage our broad backgrounds and perspectives to accelerate a new frontier in health. Valo seeks to become the catalyst for the pharmaceutical industry and drive the digital transformation of the industry. Are you ready to join us?
About the Role
As a Staff / Senior Staff Data Engineer, you will join the data engineering core in the Translational Data Sciences group, working with data scientists and engineers building powerful computational tools and answering critical scientific questions about patients, diseases, and drug development.
In this role, you will lead the development, road mapping, and execution of complex initiatives to transform real-world data (eg, electronic medical records, biomarkers and biomedical imaging, and text notes) into analysis-ready data products for internal teams. To do so you will partner with a diverse set of scientists, engineers, and domain experts across traditional industry boundaries. Primary downstream use cases of these data are longitudinal deep learning models of patient trajectories, and knowledge graph integration for target identification, statistical genetics, and multi-omics modeling.
What You'll Do...
- Build, maintain, and extend data transformation pipelines and systems to ingest and harmonize third-party EHR data into Valo’s data ecosystems
- Define Valo’s EHR data models and pipelines (spark, SQL) in a centralized data ecosystem and semi-isolated cloud environments.
- Work closely with data providers and in-house data users to integrate third-party EHR data with Valo’s standardized data
- Maintain and extend data integration (standardization & harmonization) & data quality processes to improve quality, reliability, and FAIRness
- Ensure conceptual accuracy and generalizability of data: do standardized derived features represent clinical concepts in repeatable ways?
- Simplify how data scientists access, transform, and use their data
- Promote consistent data usage patterns, including version management, shared ontologies & data dictionaries
- Support internal data users both directly and by composing demos, how-tos, and reference documentation
- Provide technical leadership within the translational data engineering team
- Simplify how data engineers build, maintain, and extend their data pipelines
- Advise colleagues on data transformations and database design
- Provide critical feedback and encourage best practices within the data engineering team
- Participate in the creation and maintenance of technical documentation
What You Bring...
- Bachelor’s degree + 8 (staff) /10 (sr staff) years of experience, MS + 6/8 YOE, PhD + 5/7 YOE in computer science, information systems, or data science
- 5+ yrs experience in a technical role in:
- SWE / DE: data ingestion, streaming technologies, troubleshooting data pipelines (eg prefect, airflow) and implement CI/CD practices
- Production programming experience in python & SQL; cloud compute and big data tools, eg spark
- 3+ yrs experience in a professional role gathering requirements and understanding customers/data users goals
- Demonstrated experience scoping projects, determining timelines and milestones, delivering end-to-end projects
- Technical project management experience (scoping, defining milestones & timelines) a plus
- Experience with EHR/EMR data and medical coding ontologies (eg, ICD, ATC, LOINC, SNOMED)
- Nice to have: experience with sparse longitudinal records, eg customer / log data with historical ontologies – about the concepts, distinct from data provenance & qualitative data and coding structures
- Experience with data engineering best practices and testing methodologies (data provenance, collaborative development using source control management (git), code versioning, reproducibility, etc)
More on Valo
Valo Health, LLC (“Valo”) is a technology company built to transform the drug discovery and development process using human-centric data and artificial intelligence-driven computation. As a digitally native company, Valo aims to fully integrate human-centric data across the entire drug development life cycle into a single unified architecture, thereby accelerating the discovery and development of life-changing drugs while simultaneously reducing costs, time, and failure rates. The company’s Opal Computational Platform™ is an integrated set of capabilities designed to transform data into valuable insights that may accelerate discoveries and enable Valo to advance a robust pipeline of programs across cardiovascular metabolic renal, oncology, and neurodegenerative diseases. Founded by Flagship Pioneering and headquartered in Lexington, MA, Valo also has offices in New York, NY. To learn more, visit www.valohealth.com.
Apply for this job
*
indicates a required field