Table of Contents
Fetching ...

A Foundation Model for General Moving Object Segmentation in Medical Images

Zhongnuo Yan, Tong Han, Yuhao Huang, Lian Liu, Han Zhou, Jiongquan Chen, Wenlong Shi, Yan Cao, Xin Yang, Dong Ni

TL;DR

The paper tackles the high annotation burden in medical image segmentation for videos and 3D volumes by introducing iMOS, a foundation model for semi-supervised moving object segmentation in medical images. Building on a memory-based MOS framework, it uses sensory, working, and long-term memories, memory reading/updating, prototype selection, and memory potentiation, together with parameter-efficient adapters to align medical data with existing MOS paradigms. Empirical results across five multimodal modalities show that fine-tuning yields strong performance and generalization, with bidirectional segmentation from a single annotated frame and good results on unseen categories. This work promises to accelerate expert annotation workflows and catalyze the development of scalable medical foundation models.

Abstract

Medical image segmentation aims to delineate the anatomical or pathological structures of interest, playing a crucial role in clinical diagnosis. A substantial amount of high-quality annotated data is crucial for constructing high-precision deep segmentation models. However, medical annotation is highly cumbersome and time-consuming, especially for medical videos or 3D volumes, due to the huge labeling space and poor inter-frame consistency. Recently, a fundamental task named Moving Object Segmentation (MOS) has made significant advancements in natural images. Its objective is to delineate moving objects from the background within image sequences, requiring only minimal annotations. In this paper, we propose the first foundation model, named iMOS, for MOS in medical images. Extensive experiments on a large multi-modal medical dataset validate the effectiveness of the proposed iMOS. Specifically, with the annotation of only a small number of images in the sequence, iMOS can achieve satisfactory tracking and segmentation performance of moving objects throughout the entire sequence in bi-directions. We hope that the proposed iMOS can help accelerate the annotation speed of experts, and boost the development of medical foundation models.

A Foundation Model for General Moving Object Segmentation in Medical Images

TL;DR

The paper tackles the high annotation burden in medical image segmentation for videos and 3D volumes by introducing iMOS, a foundation model for semi-supervised moving object segmentation in medical images. Building on a memory-based MOS framework, it uses sensory, working, and long-term memories, memory reading/updating, prototype selection, and memory potentiation, together with parameter-efficient adapters to align medical data with existing MOS paradigms. Empirical results across five multimodal modalities show that fine-tuning yields strong performance and generalization, with bidirectional segmentation from a single annotated frame and good results on unseen categories. This work promises to accelerate expert annotation workflows and catalyze the development of scalable medical foundation models.

Abstract

Medical image segmentation aims to delineate the anatomical or pathological structures of interest, playing a crucial role in clinical diagnosis. A substantial amount of high-quality annotated data is crucial for constructing high-precision deep segmentation models. However, medical annotation is highly cumbersome and time-consuming, especially for medical videos or 3D volumes, due to the huge labeling space and poor inter-frame consistency. Recently, a fundamental task named Moving Object Segmentation (MOS) has made significant advancements in natural images. Its objective is to delineate moving objects from the background within image sequences, requiring only minimal annotations. In this paper, we propose the first foundation model, named iMOS, for MOS in medical images. Extensive experiments on a large multi-modal medical dataset validate the effectiveness of the proposed iMOS. Specifically, with the annotation of only a small number of images in the sequence, iMOS can achieve satisfactory tracking and segmentation performance of moving objects throughout the entire sequence in bi-directions. We hope that the proposed iMOS can help accelerate the annotation speed of experts, and boost the development of medical foundation models.
Paper Structure (13 sections, 7 figures, 3 tables)

This paper contains 13 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Process of semi-supervised MOS. (A) represents the manual annotation. (B)-(D) are segmentation results.
  • Figure 2: Architecture of iMOS. Lines with different colors are pathways of different memory systems.
  • Figure 3: Architecture (A) and implementation scheme (B, C) of adapter module. $\alpha$ controls output weight; (B) and (C) are residual blocks in query and value encoder, respectively.
  • Figure 4: Examples of the five modalities in the dataset.
  • Figure 5: Examples of segmentation results. The first image in the left column is the manually annotated mask.
  • ...and 2 more figures