Table of Contents
Fetching ...

MatchSeg: Towards Better Segmentation via Reference Image Matching

Jiayu Huo, Ruiqiang Xiao, Haotian Zheng, Yang Liu, Sebastien Ourselin, Rachel Sparks

TL;DR

Medical image segmentation often requires large labeled datasets; MatchSeg addresses this by reframing segmentation as a reference-image matching task guided by CLIP and enhanced by a joint attention mechanism. The method selects a highly relevant support set using a CLIP image encoder and uses a Joint Attention module to align query and support features for transfer learning, optimizing a composite loss over Dice, BCE, and Focal terms. Across HAM-10000, GlaS, BUS, and BUSI, MatchSeg achieves superior segmentation accuracy and demonstrates strong cross-domain generalization, outperforming both full-supervised and prior few-shot methods. The results suggest practical value for rapid adaptation of segmentation models to new imaging domains, with code available at the provided repository.

Abstract

Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the query set. Inspired by this paradigm, we introduce MatchSeg, a novel framework that enhances medical image segmentation through strategic reference image matching. We leverage contrastive language-image pre-training (CLIP) to select highly relevant samples when defining the support set. Additionally, we design a joint attention module to strengthen the interaction between support and query features, facilitating a more effective knowledge transfer between support and query sets. We validated our method across four public datasets. Experimental results demonstrate superior segmentation performance and powerful domain generalization ability of MatchSeg against existing methods for domain-specific and cross-domain segmentation tasks. Our code is made available at https://github.com/keeplearning-again/MatchSeg

MatchSeg: Towards Better Segmentation via Reference Image Matching

TL;DR

Medical image segmentation often requires large labeled datasets; MatchSeg addresses this by reframing segmentation as a reference-image matching task guided by CLIP and enhanced by a joint attention mechanism. The method selects a highly relevant support set using a CLIP image encoder and uses a Joint Attention module to align query and support features for transfer learning, optimizing a composite loss over Dice, BCE, and Focal terms. Across HAM-10000, GlaS, BUS, and BUSI, MatchSeg achieves superior segmentation accuracy and demonstrates strong cross-domain generalization, outperforming both full-supervised and prior few-shot methods. The results suggest practical value for rapid adaptation of segmentation models to new imaging domains, with code available at the provided repository.

Abstract

Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the query set. Inspired by this paradigm, we introduce MatchSeg, a novel framework that enhances medical image segmentation through strategic reference image matching. We leverage contrastive language-image pre-training (CLIP) to select highly relevant samples when defining the support set. Additionally, we design a joint attention module to strengthen the interaction between support and query features, facilitating a more effective knowledge transfer between support and query sets. We validated our method across four public datasets. Experimental results demonstrate superior segmentation performance and powerful domain generalization ability of MatchSeg against existing methods for domain-specific and cross-domain segmentation tasks. Our code is made available at https://github.com/keeplearning-again/MatchSeg
Paper Structure (14 sections, 5 equations, 4 figures, 4 tables)

This paper contains 14 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The MatchSeg framework comprises (a) support set selection using a CLIP image encoder, (b) the segmentation model, and (c) the Joint Attention module. In the Joint Attention module, $k$ is the number of support image-label pairs. $C$ is the number of channels of the feature map. $C'$ is the reduced number of channels for lower computational cost. $H\times W$ is the spatial feature size.
  • Figure 2: Segmentation performance of UniverSeg with different support set selection strategies and sizes for the HAM10000 dataset.
  • Figure 3: Visualization of different segmentation methods on BUSI, BUS, GlaS and HAM10000 datasets.
  • Figure 4: Segmentation results visualization of different methods on the HAM10000 dataset within the cross-domain setting. All models were trained on the NV lesion type, but tested on other lesion types.