Table of Contents
Fetching ...

WT-BCP: Wavelet Transform based Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

Mingya Zhang, Liang Wang, Limei Gu, Tingsheng Ling, Xianping Tao

TL;DR

WT-BCP tackles distribution mismatch and perturbation biases in semi-supervised medical image segmentation by coupling a bidirectional copy-paste data augmentation with Wavelet Transform–based low-frequency ($LF$) and high-frequency ($HF$) decomposition and a tri-branch XNet-Plus to fuse multi-frequency representations. The Mean Teacher framework provides consistency training across multiple outputs, aided by loss terms $L_{in}$, $L_{out}$, and $L_{con}$ with weights $\alpha$ and $\beta$ and EMA-based teacher updates. Across 2D and 3D datasets (ACDC, LA, Pancreas-NIH), WT-BCP achieves state-of-the-art or competitive performance, delivering higher Dice and Jaccard scores while improving boundary metrics such as $95HD$ and ASD, especially at low labeling ratios. By leveraging $LF$/$HF$ fusion and multi-output supervision, the approach reduces labeled data requirements and enhances detail preservation, offering a practical path toward robust clinical segmentation.

Abstract

Semi-supervised medical image segmentation (SSMIS) shows promise in reducing reliance on scarce labeled medical data. However, SSMIS field confronts challenges such as distribution mismatches between labeled and unlabeled data, artificial perturbations causing training biases, and inadequate use of raw image information, especially low-frequency (LF) and high-frequency (HF) components.To address these challenges, we propose a Wavelet Transform based Bidirectional Copy-Paste SSMIS framework, named WT-BCP, which improves upon the Mean Teacher approach. Our method enhances unlabeled data understanding by copying random crops between labeled and unlabeled images and employs WT to extract LF and HF details.We propose a multi-input and multi-output model named XNet-Plus, to receive the fused information after WT. Moreover, consistency training among multiple outputs helps to mitigate learning biases introduced by artificial perturbations. During consistency training, the mixed images resulting from WT are fed into both models, with the student model's output being supervised by pseudo-labels and ground-truth. Extensive experiments conducted on 2D and 3D datasets confirm the effectiveness of our model.Code: https://github.com/simzhangbest/WT-BCP.

WT-BCP: Wavelet Transform based Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

TL;DR

WT-BCP tackles distribution mismatch and perturbation biases in semi-supervised medical image segmentation by coupling a bidirectional copy-paste data augmentation with Wavelet Transform–based low-frequency () and high-frequency () decomposition and a tri-branch XNet-Plus to fuse multi-frequency representations. The Mean Teacher framework provides consistency training across multiple outputs, aided by loss terms , , and with weights and and EMA-based teacher updates. Across 2D and 3D datasets (ACDC, LA, Pancreas-NIH), WT-BCP achieves state-of-the-art or competitive performance, delivering higher Dice and Jaccard scores while improving boundary metrics such as and ASD, especially at low labeling ratios. By leveraging / fusion and multi-output supervision, the approach reduces labeled data requirements and enhances detail preservation, offering a practical path toward robust clinical segmentation.

Abstract

Semi-supervised medical image segmentation (SSMIS) shows promise in reducing reliance on scarce labeled medical data. However, SSMIS field confronts challenges such as distribution mismatches between labeled and unlabeled data, artificial perturbations causing training biases, and inadequate use of raw image information, especially low-frequency (LF) and high-frequency (HF) components.To address these challenges, we propose a Wavelet Transform based Bidirectional Copy-Paste SSMIS framework, named WT-BCP, which improves upon the Mean Teacher approach. Our method enhances unlabeled data understanding by copying random crops between labeled and unlabeled images and employs WT to extract LF and HF details.We propose a multi-input and multi-output model named XNet-Plus, to receive the fused information after WT. Moreover, consistency training among multiple outputs helps to mitigate learning biases introduced by artificial perturbations. During consistency training, the mixed images resulting from WT are fed into both models, with the student model's output being supervised by pseudo-labels and ground-truth. Extensive experiments conducted on 2D and 3D datasets confirm the effectiveness of our model.Code: https://github.com/simzhangbest/WT-BCP.

Paper Structure

This paper contains 18 sections, 13 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Take ACDC as an example, visualize LF and HF results. a. Raw image. b. Wavelet transform results. c. LF image. d. HF image.
  • Figure 2: Overview of our WT-BCP framework for SSIM. The WT-BCP framework adopts a BCP strategy, merging two labeled and two unlabeled images to create mix images. The teacher uses unlabeled images to generate pseudo-labels. Ground truths and pseudo-labels are then mixed to produce mix labels. Student and Teacher model are XNet-Plus, as illustrated in Figure \ref{['fig:net']}
  • Figure 3: Overview of XNet-Plus model. XNet-Plus consists of main network $M$ , LF network $L$ and HF network $H$, and uses raw image $X_{M}$ , LF complementary fusion image $X_{L}$ and HF complementary fusion image $X_{H}$ as input.
  • Figure 4: Topological flow chart of XNet-Plus data processing.