Table of Contents
Fetching ...

Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation

Qing Xu, Yuxiang Luo, Wenting Duan, Zhen Chen

TL;DR

The paper addresses the need for versatile medical segmentation by coupling semantic and instance tasks rather than treating them in isolation. It introduces Co-Seg++ with a spatio-sequential prompt encoder (SSP-Encoder) and a multi-task collaborative decoder (MTC-Decoder) to enable mutual, bidirectional guidance between tasks. Across histopathology and dental CBCT datasets, Co-Seg++ achieves state-of-the-art performance in semantic, instance, and panoptic segmentation, while offering better efficiency and robustness to domain shifts and limited annotations. Ablation and interpretability analyses demonstrate the tangible benefits of the co-segmentation paradigm and cross-task prompts, guiding future work toward 3D and multi-modal medical image analysis.

Abstract

Medical image analysis is critical yet challenged by the need of jointly segmenting organs or tissues, and numerous instances for anatomical structures and tumor microenvironment analysis. Existing studies typically formulated different segmentation tasks in isolation, which overlooks the fundamental interdependencies between these tasks, leading to suboptimal segmentation performance and insufficient medical image understanding. To address this issue, we propose a Co-Seg++ framework for versatile medical segmentation. Specifically, we introduce a novel co-segmentation paradigm, allowing semantic and instance segmentation tasks to mutually enhance each other. We first devise a spatio-sequential prompt encoder (SSP-Encoder) to capture long-range spatial and sequential relationships between segmentation regions and image embeddings as prior spatial constraints. Moreover, we devise a multi-task collaborative decoder (MTC-Decoder) that leverages cross-guidance to strengthen the contextual consistency of both tasks, jointly computing semantic and instance segmentation masks. Extensive experiments on diverse CT and histopathology datasets demonstrate that the proposed Co-Seg++ outperforms state-of-the-arts in the semantic, instance, and panoptic segmentation of dental anatomical structures, histopathology tissues, and nuclei instances. The source code is available at https://github.com/xq141839/Co-Seg-Plus.

Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation

TL;DR

The paper addresses the need for versatile medical segmentation by coupling semantic and instance tasks rather than treating them in isolation. It introduces Co-Seg++ with a spatio-sequential prompt encoder (SSP-Encoder) and a multi-task collaborative decoder (MTC-Decoder) to enable mutual, bidirectional guidance between tasks. Across histopathology and dental CBCT datasets, Co-Seg++ achieves state-of-the-art performance in semantic, instance, and panoptic segmentation, while offering better efficiency and robustness to domain shifts and limited annotations. Ablation and interpretability analyses demonstrate the tangible benefits of the co-segmentation paradigm and cross-task prompts, guiding future work toward 3D and multi-modal medical image analysis.

Abstract

Medical image analysis is critical yet challenged by the need of jointly segmenting organs or tissues, and numerous instances for anatomical structures and tumor microenvironment analysis. Existing studies typically formulated different segmentation tasks in isolation, which overlooks the fundamental interdependencies between these tasks, leading to suboptimal segmentation performance and insufficient medical image understanding. To address this issue, we propose a Co-Seg++ framework for versatile medical segmentation. Specifically, we introduce a novel co-segmentation paradigm, allowing semantic and instance segmentation tasks to mutually enhance each other. We first devise a spatio-sequential prompt encoder (SSP-Encoder) to capture long-range spatial and sequential relationships between segmentation regions and image embeddings as prior spatial constraints. Moreover, we devise a multi-task collaborative decoder (MTC-Decoder) that leverages cross-guidance to strengthen the contextual consistency of both tasks, jointly computing semantic and instance segmentation masks. Extensive experiments on diverse CT and histopathology datasets demonstrate that the proposed Co-Seg++ outperforms state-of-the-arts in the semantic, instance, and panoptic segmentation of dental anatomical structures, histopathology tissues, and nuclei instances. The source code is available at https://github.com/xq141839/Co-Seg-Plus.

Paper Structure

This paper contains 23 sections, 12 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Comparison of our Co-Seg++ and existing segmentation works. (a) Two independent networks for semantic and instance segmentation. (b) A shared image encoder but separated task decoders for semantic and instance segmentation. (c) Our Co-Seg++ leverages spatio-temporal prompts for collaborative semantic and instance segmentation.
  • Figure 2: Regional interdependencies between semantic and instance segmentation tasks in nuclei images. Semantic regions (e.g., glands) often encompass multiple types of instance-level structures (e.g., lympho-reticular (LR) and connective (Con) nuclei). The relationship of semantic and instance segmentation confirms contextual understanding and motivates the proposed joint optimization of versatile medical segmentation.
  • Figure 3: The overview of our Co-Seg++ framework for collaborative versatile medical segmentation, consisting of SSP-Encoder that captures long-range spatial and sequential relationships and MTC-Decoder with semantic and instance heads for mutual task guidance. Co-Seg++ operates through two forward passes: the first pass generates binary masks from both heads as prior spatial constraints supervised by a spatial consistency constraint loss. In the second forward pass, the semantic and instance heads in MTC-Decoder leverage these constraints via cross-attention to mutually enhance semantic region delineation and instance classification. The Co-Seg++ framework fully exploits complementary contextual information through spatio-temporal prompts.
  • Figure 4: The illustration of SSP-Encoder, integrating spatial prompts and sequential memory to establish long-range relationships between target segmentation regions and shared image embeddings, providing prior spatial constraints for cross-task guidance.
  • Figure 5: The architecture of the MTC-Decoder, employing cross-guidance mechanisms and probability distribution alignment to enable semantic and instance segmentation tasks to mutually enhance each other while ensuring spatial consistency in segmentation decoding.
  • ...and 5 more figures