Pipeline
Browse Jobs
Sign inSign up
Pipeline
Browse jobsSign inContactTermsPrivacyCookiesPreferences
Logos provided by Logo.dev

© 2026 Pipeline. All rights reserved.

  1. Home
  2. Jobs
  3. Product- Concert AI
  4. Data Manager — Multimodal Medical Foundation Models
SAIGroup logo

SAIGroup

Data Manager — Multimodal Medical Foundation Models at SAIGroup

BangaloreFull-timeProduct- Concert AIPosted 25 days ago
Apply with Pipeline→

About the Role

<p><strong>About the Role</strong></p> <p>You will lead data operations for a cutting-edge research group developing&nbsp;<strong>3D medical multimodal foundation models</strong>and&nbsp;<strong>agentic clinical AI systems</strong>. These models rely on extremely high-quality, well-structured, and compliant datasets—including&nbsp;<strong>3D medical imaging volumes (MRI, CT, PET)</strong>,&nbsp;<strong>clinical text corpora</strong>,&nbsp;<strong>annotations</strong>, and&nbsp;<strong>multimodal metadata</strong>.</p> <p>Your job is to own the end-to-end data lifecycle:&nbsp;<strong>acquisition, ingestion, cleaning, versioning, labeling, quality control, governance, and delivery to researchers</strong>. You are the central node ensuring our foundation model teams and medical agent teams have clean, scalable, well-documented data pipelines.</p> <p>This is a pivotal foundational role—without great data, large models cannot be great.</p> <p>&nbsp;</p> <p><strong>What You Will Work On</strong></p> <p><strong>Multimodal Medical Data Ops</strong></p> <ul> <li>Oversee ingestion and processing of&nbsp;<strong>3D medical volumes</strong>&nbsp;(DICOM, NIfTI, MHA) and associated clinical texts.</li> <li>Build automated pipelines for&nbsp;<strong>metadata extraction</strong>,&nbsp;<strong>de-identification</strong>,&nbsp;<strong>slice/series validation</strong>, and&nbsp;<strong>cohort structuring</strong>.</li> <li>Manage large-scale internal datasets and external research datasets (BraTS, LiTS, MIMIC-CXR, CheXpert, MosMed, etc.).</li> </ul> <p><strong>Data Infrastructure &amp; Versioning</strong></p> <ul> <li>Implement scalable&nbsp;<strong>data storage, cataloging, and retrieval</strong>&nbsp;systems for multimodal training data.</li> <li>Own dataset&nbsp;<strong>version control</strong>, lineage tracking, reproducibility, and dataset documentation.</li> <li>Collaborate with ML systems engineers on high-throughput&nbsp;<strong>data loaders, sharding strategies, and caching mechanisms</strong>.</li> </ul> <p><strong>Annotation &amp; Labeling Programs</strong></p> <ul> <li>Lead medical annotation workflows with radiologists, medical students, and labeling vendors.</li> <li>Create guidelines for&nbsp;<strong>ROI labeling</strong>,&nbsp;<strong>segmentation</strong>,&nbsp;<strong>captioning</strong>,&nbsp;<strong>report alignment</strong>, and&nbsp;<strong>case-level curation</strong>.</li> <li>Build&nbsp;<strong>semi-automated labeling pipelines</strong>&nbsp;using model-assisted tools.</li> </ul> <p><strong>Data Quality, Compliance &amp; Governance</strong></p> <ul> <li>Enforce strict standards on data&nbsp;<strong>quality</strong>,&nbsp;<strong>completeness</strong>,&nbsp;<strong>consistency</strong>, and&nbsp;<strong>bias control</strong>.</li> <li>Ensure adherence to&nbsp;<strong>medical data privacy</strong>,&nbsp;<strong>HIPAA-equivalent frameworks</strong>, and institutional data-sharing rules.</li> <li>Manage PHI de-identification, audit logs, access control, and compliance approvals.</li> </ul> <p><strong>Collaboration with Research &amp; Engineering</strong></p> <ul> <li>Work closely with foundation-model researchers to understand data needs for model training.</li> <li>Partner with agentic system designers to supply structured datasets for clinical reasoning tasks.</li> <li>Collaborate with foundational engineers on data access layers, performance bottlenecks, and dataset optimization.</li> </ul> <p>&nbsp;</p> <p><strong>Why This Role Is Critical</strong></p> <ul> <li>The foundation model relies on&nbsp;<strong>high-quality 3D and textual data</strong>&nbsp;at scale.</li> <li>You shape the&nbsp;<strong>data pipelines</strong>&nbsp;enabling next-generation medical AI agents.</li> <li>You ensure&nbsp;<strong>clinical-grade governance</strong>, safety, reproducibility, and trust.</li> <li>Your systems become the backbone for research, experiments, and deployments.</li> </ul> <p>For candidates motivated by the intersection of data, healthcare, and machine learning, this is a high-impact opportunity.</p> <p>&nbsp;</p> <p><strong>What We’re Looking For</strong></p> <ul> <li>Strong experience managing&nbsp;<strong>large multimodal or imaging datasets</strong>, ideally medical imaging.</li> <li>Proficiency with&nbsp;<strong>DICOM/DICOMweb</strong>, NIfTI, PACS systems, and medical imaging toolkits (dicompyler, pydicom, MONAI, ITK).</li> <li>Experience with&nbsp;<strong>ETL pipelines</strong>, distributed data systems, and cloud/on-prem storage.</li> <li>Knowledge of&nbsp;<strong>metadata standards</strong>, ontologies, and text–image linking strategies.</li> <li>Comfortable working with Python, SQL, and data tooling (Airflow, Prefect, Dagster, DBT, Delta Lake, etc.).</li> <li>Understanding of&nbsp;<strong>data privacy</strong>, de-identification, and compliance requirements in healthcare.</li> <li>Strong communication skills and the ability to coordinate between engineers, researchers, clinicians, and data partners.</li> </ul> <p>&nbsp;</p> <p><strong>Nice to Have</strong></p> <ul> <li>Experience with&nbsp;<strong>vector databases</strong>, multimodal retrieval, or embedding store design.</li> <li>Familiarity with annotation tools (Labelbox, CVAT, iMerit, custom MONAI Label pipelines).</li> <li>Prior work with clinical NLP datasets or multilingual Indian medical corpora.</li> <li>Experience conducting&nbsp;<strong>bias audits</strong>, dataset characterization, or quality scoring at scale.</li> <li>Contributions to open datasets, benchmarks, or data documentation frameworks.</li> </ul> <p>&nbsp;</p> <p><strong>What We Offer</strong></p> <ul> <li>Competitive compensation.</li> <li>Access to one of the most ambitious&nbsp;<strong>medical multimodal datasets</strong>&nbsp;in the region.</li> <li>Collaboration with scientists building India’s first 3D multimodal medical foundation model.</li> <li>Autonomy to design data systems from the ground up.</li> <li>A mission-driven team working to transform clinical care with agentic AI.</li> </ul>

Related Roles

  • Clinical Operations Manager — Clinical Validation & Medical AI Studies

    SAIGroup

    Bangalore
  • Expert Physician — Radiologist

    SAIGroup

    Bangalore
  • Foundational Model Engineer — Multimodal & Agentic Medical AI Systems

    SAIGroup

    Bangalore
  • Senior Agentic AI Engineer — Agentic Medical AI

    SAIGroup

    Bangalore
  • Senior Developer — Agentic Clinical Workflow & Orchestration

    SAIGroup

    Bangalore
  • Pre-Sales Architect

    SAIGroup

    Los Altos, CA