Table of Contents
Fetching ...

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye, Guoan Wang, Tianbin Li, Haoyu Wang, Ruoyu Li, He Yao, Junren Chen, Jingwen Li, Yanzhou Su, Min Zhu, Junjun He

TL;DR

The IMed-361M benchmark dataset is introduced, a significant advancement in general IMIS research and an IMIS baseline network is developed on this dataset that supports high-quality mask generation through interactive inputs, including clicks, bounding boxes, text prompts, and their combinations.

Abstract

Interactive Medical Image Segmentation (IMIS) has long been constrained by the limited availability of large-scale, diverse, and densely annotated datasets, which hinders model generalization and consistent evaluation across different models. In this paper, we introduce the IMed-361M benchmark dataset, a significant advancement in general IMIS research. First, we collect and standardize over 6.4 million medical images and their corresponding ground truth masks from multiple data sources. Then, leveraging the strong object recognition capabilities of a vision foundational model, we automatically generated dense interactive masks for each image and ensured their quality through rigorous quality control and granularity management. Unlike previous datasets, which are limited by specific modalities or sparse annotations, IMed-361M spans 14 modalities and 204 segmentation targets, totaling 361 million masks-an average of 56 masks per image. Finally, we developed an IMIS baseline network on this dataset that supports high-quality mask generation through interactive inputs, including clicks, bounding boxes, text prompts, and their combinations. We evaluate its performance on medical image segmentation tasks from multiple perspectives, demonstrating superior accuracy and scalability compared to existing interactive segmentation models. To facilitate research on foundational models in medical computer vision, we release the IMed-361M and model at https://github.com/uni-medical/IMIS-Bench.

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

TL;DR

The IMed-361M benchmark dataset is introduced, a significant advancement in general IMIS research and an IMIS baseline network is developed on this dataset that supports high-quality mask generation through interactive inputs, including clicks, bounding boxes, text prompts, and their combinations.

Abstract

Interactive Medical Image Segmentation (IMIS) has long been constrained by the limited availability of large-scale, diverse, and densely annotated datasets, which hinders model generalization and consistent evaluation across different models. In this paper, we introduce the IMed-361M benchmark dataset, a significant advancement in general IMIS research. First, we collect and standardize over 6.4 million medical images and their corresponding ground truth masks from multiple data sources. Then, leveraging the strong object recognition capabilities of a vision foundational model, we automatically generated dense interactive masks for each image and ensured their quality through rigorous quality control and granularity management. Unlike previous datasets, which are limited by specific modalities or sparse annotations, IMed-361M spans 14 modalities and 204 segmentation targets, totaling 361 million masks-an average of 56 masks per image. Finally, we developed an IMIS baseline network on this dataset that supports high-quality mask generation through interactive inputs, including clicks, bounding boxes, text prompts, and their combinations. We evaluate its performance on medical image segmentation tasks from multiple perspectives, demonstrating superior accuracy and scalability compared to existing interactive segmentation models. To facilitate research on foundational models in medical computer vision, we release the IMed-361M and model at https://github.com/uni-medical/IMIS-Bench.

Paper Structure

This paper contains 15 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: We collected 110 medical image datasets from various sources and generated the IMed-361M dataset, which contains over 361 million masks, through a rigorous and standardized data processing pipeline. Using this dataset, we developed the IMIS baseline network.
  • Figure 2: Overview of the IMed-361M dataset. (a) Number of images and masks for each modality. (b) Information on six anatomical structures. (c) Distribution of image resolutions. (d) Analysis of mask proportions. (e) Comparison with other existing public datasets.
  • Figure 3: Evaluation of the quality of interactive masks.
  • Figure 4: The training process of IMIS-Net simulates K consecutive steps of interactive segmentation.
  • Figure 5: Comparison of IMIS-Net with existing foundation models, with performance statistics at both image and mask levels.
  • ...and 8 more figures