Table of Contents
Fetching ...

A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset

Stefano Woerner, Arthur Jaques, Christian F. Baumgartner

TL;DR

MedIMeta assembles 19 openly licensed medical imaging datasets across 10 domains into a unified meta-dataset of $224×224$ images and 54 tasks, with ready-to-use PyTorch loaders and predefined splits for single-domain, cross-domain, and few-shot evaluation. The paper validates the dataset through fully supervised baselines and cross-domain few-shot learning experiments, comparing ImageNet pre-training (IM-PT), multi-domain multi-task pre-training (mm-PT), and multi-domain MAML (mm-MAML). Key contributions include a standardized, multi-task, multi-domain medical imaging benchmark, with open-source code, data loaders, and Zenodo data releases that enable easy integration with existing pipelines. This dataset enables robust cross-domain evaluation and fosters research into generalizable medical-image representations and few-shot learning across diverse clinical tasks.

Abstract

While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.

A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset

TL;DR

MedIMeta assembles 19 openly licensed medical imaging datasets across 10 domains into a unified meta-dataset of images and 54 tasks, with ready-to-use PyTorch loaders and predefined splits for single-domain, cross-domain, and few-shot evaluation. The paper validates the dataset through fully supervised baselines and cross-domain few-shot learning experiments, comparing ImageNet pre-training (IM-PT), multi-domain multi-task pre-training (mm-PT), and multi-domain MAML (mm-MAML). Key contributions include a standardized, multi-task, multi-domain medical imaging benchmark, with open-source code, data loaders, and Zenodo data releases that enable easy integration with existing pipelines. This dataset enables robust cross-domain evaluation and fosters research into generalizable medical-image representations and few-shot learning across diverse clinical tasks.

Abstract

While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.
Paper Structure (3 sections, 2 figures, 4 tables)

This paper contains 3 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Example images of all MedIMeta datasets.
  • Figure 2: An overview of the CD-FSL scenario: The few-shot learner is first trained on the meta-dataset of highly diverse training data. It is then adapted to a new task from a new domain using the labeled examples from the support set of a few-shot task. Performance is assessed using a query set from the same task.