Back to jobs

Data Manager — Multimodal Medical Foundation Models

Bangalore

About the Role

You will lead data operations for a cutting-edge research group developing 3D medical multimodal foundation modelsand agentic clinical AI systems. These models rely on extremely high-quality, well-structured, and compliant datasets—including 3D medical imaging volumes (MRI, CT, PET)clinical text corporaannotations, and multimodal metadata.

Your job is to own the end-to-end data lifecycle: acquisition, ingestion, cleaning, versioning, labeling, quality control, governance, and delivery to researchers. You are the central node ensuring our foundation model teams and medical agent teams have clean, scalable, well-documented data pipelines.

This is a pivotal foundational role—without great data, large models cannot be great.

 

What You Will Work On

Multimodal Medical Data Ops

  • Oversee ingestion and processing of 3D medical volumes (DICOM, NIfTI, MHA) and associated clinical texts.
  • Build automated pipelines for metadata extractionde-identificationslice/series validation, and cohort structuring.
  • Manage large-scale internal datasets and external research datasets (BraTS, LiTS, MIMIC-CXR, CheXpert, MosMed, etc.).

Data Infrastructure & Versioning

  • Implement scalable data storage, cataloging, and retrieval systems for multimodal training data.
  • Own dataset version control, lineage tracking, reproducibility, and dataset documentation.
  • Collaborate with ML systems engineers on high-throughput data loaders, sharding strategies, and caching mechanisms.

Annotation & Labeling Programs

  • Lead medical annotation workflows with radiologists, medical students, and labeling vendors.
  • Create guidelines for ROI labelingsegmentationcaptioningreport alignment, and case-level curation.
  • Build semi-automated labeling pipelines using model-assisted tools.

Data Quality, Compliance & Governance

  • Enforce strict standards on data qualitycompletenessconsistency, and bias control.
  • Ensure adherence to medical data privacyHIPAA-equivalent frameworks, and institutional data-sharing rules.
  • Manage PHI de-identification, audit logs, access control, and compliance approvals.

Collaboration with Research & Engineering

  • Work closely with foundation-model researchers to understand data needs for model training.
  • Partner with agentic system designers to supply structured datasets for clinical reasoning tasks.
  • Collaborate with foundational engineers on data access layers, performance bottlenecks, and dataset optimization.

 

Why This Role Is Critical

  • The foundation model relies on high-quality 3D and textual data at scale.
  • You shape the data pipelines enabling next-generation medical AI agents.
  • You ensure clinical-grade governance, safety, reproducibility, and trust.
  • Your systems become the backbone for research, experiments, and deployments.

For candidates motivated by the intersection of data, healthcare, and machine learning, this is a high-impact opportunity.

 

What We’re Looking For

  • Strong experience managing large multimodal or imaging datasets, ideally medical imaging.
  • Proficiency with DICOM/DICOMweb, NIfTI, PACS systems, and medical imaging toolkits (dicompyler, pydicom, MONAI, ITK).
  • Experience with ETL pipelines, distributed data systems, and cloud/on-prem storage.
  • Knowledge of metadata standards, ontologies, and text–image linking strategies.
  • Comfortable working with Python, SQL, and data tooling (Airflow, Prefect, Dagster, DBT, Delta Lake, etc.).
  • Understanding of data privacy, de-identification, and compliance requirements in healthcare.
  • Strong communication skills and the ability to coordinate between engineers, researchers, clinicians, and data partners.

 

Nice to Have

  • Experience with vector databases, multimodal retrieval, or embedding store design.
  • Familiarity with annotation tools (Labelbox, CVAT, iMerit, custom MONAI Label pipelines).
  • Prior work with clinical NLP datasets or multilingual Indian medical corpora.
  • Experience conducting bias audits, dataset characterization, or quality scoring at scale.
  • Contributions to open datasets, benchmarks, or data documentation frameworks.

 

What We Offer

  • Competitive compensation.
  • Access to one of the most ambitious medical multimodal datasets in the region.
  • Collaboration with scientists building India’s first 3D multimodal medical foundation model.
  • Autonomy to design data systems from the ground up.
  • A mission-driven team working to transform clinical care with agentic AI.

Create a Job Alert

Interested in building your career at SAIGroup? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf