- Home
- Jobs
- Product- Concert AI
- Data Manager — Multimodal Medical Foundation Models

Data Manager — Multimodal Medical Foundation Models at SAIGroup
BangaloreFull-timeProduct- Concert AIPosted 25 days ago
Apply with PipelineAbout the Role
<p><strong>About the Role</strong></p>
<p>You will lead data operations for a cutting-edge research group developing <strong>3D medical multimodal foundation models</strong>and <strong>agentic clinical AI systems</strong>. These models rely on extremely high-quality, well-structured, and compliant datasets—including <strong>3D medical imaging volumes (MRI, CT, PET)</strong>, <strong>clinical text corpora</strong>, <strong>annotations</strong>, and <strong>multimodal metadata</strong>.</p>
<p>Your job is to own the end-to-end data lifecycle: <strong>acquisition, ingestion, cleaning, versioning, labeling, quality control, governance, and delivery to researchers</strong>. You are the central node ensuring our foundation model teams and medical agent teams have clean, scalable, well-documented data pipelines.</p>
<p>This is a pivotal foundational role—without great data, large models cannot be great.</p>
<p> </p>
<p><strong>What You Will Work On</strong></p>
<p><strong>Multimodal Medical Data Ops</strong></p>
<ul>
<li>Oversee ingestion and processing of <strong>3D medical volumes</strong> (DICOM, NIfTI, MHA) and associated clinical texts.</li>
<li>Build automated pipelines for <strong>metadata extraction</strong>, <strong>de-identification</strong>, <strong>slice/series validation</strong>, and <strong>cohort structuring</strong>.</li>
<li>Manage large-scale internal datasets and external research datasets (BraTS, LiTS, MIMIC-CXR, CheXpert, MosMed, etc.).</li>
</ul>
<p><strong>Data Infrastructure & Versioning</strong></p>
<ul>
<li>Implement scalable <strong>data storage, cataloging, and retrieval</strong> systems for multimodal training data.</li>
<li>Own dataset <strong>version control</strong>, lineage tracking, reproducibility, and dataset documentation.</li>
<li>Collaborate with ML systems engineers on high-throughput <strong>data loaders, sharding strategies, and caching mechanisms</strong>.</li>
</ul>
<p><strong>Annotation & Labeling Programs</strong></p>
<ul>
<li>Lead medical annotation workflows with radiologists, medical students, and labeling vendors.</li>
<li>Create guidelines for <strong>ROI labeling</strong>, <strong>segmentation</strong>, <strong>captioning</strong>, <strong>report alignment</strong>, and <strong>case-level curation</strong>.</li>
<li>Build <strong>semi-automated labeling pipelines</strong> using model-assisted tools.</li>
</ul>
<p><strong>Data Quality, Compliance & Governance</strong></p>
<ul>
<li>Enforce strict standards on data <strong>quality</strong>, <strong>completeness</strong>, <strong>consistency</strong>, and <strong>bias control</strong>.</li>
<li>Ensure adherence to <strong>medical data privacy</strong>, <strong>HIPAA-equivalent frameworks</strong>, and institutional data-sharing rules.</li>
<li>Manage PHI de-identification, audit logs, access control, and compliance approvals.</li>
</ul>
<p><strong>Collaboration with Research & Engineering</strong></p>
<ul>
<li>Work closely with foundation-model researchers to understand data needs for model training.</li>
<li>Partner with agentic system designers to supply structured datasets for clinical reasoning tasks.</li>
<li>Collaborate with foundational engineers on data access layers, performance bottlenecks, and dataset optimization.</li>
</ul>
<p> </p>
<p><strong>Why This Role Is Critical</strong></p>
<ul>
<li>The foundation model relies on <strong>high-quality 3D and textual data</strong> at scale.</li>
<li>You shape the <strong>data pipelines</strong> enabling next-generation medical AI agents.</li>
<li>You ensure <strong>clinical-grade governance</strong>, safety, reproducibility, and trust.</li>
<li>Your systems become the backbone for research, experiments, and deployments.</li>
</ul>
<p>For candidates motivated by the intersection of data, healthcare, and machine learning, this is a high-impact opportunity.</p>
<p> </p>
<p><strong>What We’re Looking For</strong></p>
<ul>
<li>Strong experience managing <strong>large multimodal or imaging datasets</strong>, ideally medical imaging.</li>
<li>Proficiency with <strong>DICOM/DICOMweb</strong>, NIfTI, PACS systems, and medical imaging toolkits (dicompyler, pydicom, MONAI, ITK).</li>
<li>Experience with <strong>ETL pipelines</strong>, distributed data systems, and cloud/on-prem storage.</li>
<li>Knowledge of <strong>metadata standards</strong>, ontologies, and text–image linking strategies.</li>
<li>Comfortable working with Python, SQL, and data tooling (Airflow, Prefect, Dagster, DBT, Delta Lake, etc.).</li>
<li>Understanding of <strong>data privacy</strong>, de-identification, and compliance requirements in healthcare.</li>
<li>Strong communication skills and the ability to coordinate between engineers, researchers, clinicians, and data partners.</li>
</ul>
<p> </p>
<p><strong>Nice to Have</strong></p>
<ul>
<li>Experience with <strong>vector databases</strong>, multimodal retrieval, or embedding store design.</li>
<li>Familiarity with annotation tools (Labelbox, CVAT, iMerit, custom MONAI Label pipelines).</li>
<li>Prior work with clinical NLP datasets or multilingual Indian medical corpora.</li>
<li>Experience conducting <strong>bias audits</strong>, dataset characterization, or quality scoring at scale.</li>
<li>Contributions to open datasets, benchmarks, or data documentation frameworks.</li>
</ul>
<p> </p>
<p><strong>What We Offer</strong></p>
<ul>
<li>Competitive compensation.</li>
<li>Access to one of the most ambitious <strong>medical multimodal datasets</strong> in the region.</li>
<li>Collaboration with scientists building India’s first 3D multimodal medical foundation model.</li>
<li>Autonomy to design data systems from the ground up.</li>
<li>A mission-driven team working to transform clinical care with agentic AI.</li>
</ul>
Related Roles
Clinical Operations Manager — Clinical Validation & Medical AI Studies
SAIGroup
BangaloreExpert Physician — Radiologist
SAIGroup
BangaloreFoundational Model Engineer — Multimodal & Agentic Medical AI Systems
SAIGroup
BangaloreSenior Agentic AI Engineer — Agentic Medical AI
SAIGroup
BangaloreSenior Developer — Agentic Clinical Workflow & Orchestration
SAIGroup
BangalorePre-Sales Architect
SAIGroup
Los Altos, CA