WT-BCP: Wavelet Transform based Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
Mingya Zhang, Liang Wang, Limei Gu, Tingsheng Ling, Xianping Tao
TL;DR
WT-BCP tackles distribution mismatch and perturbation biases in semi-supervised medical image segmentation by coupling a bidirectional copy-paste data augmentation with Wavelet Transform–based low-frequency ($LF$) and high-frequency ($HF$) decomposition and a tri-branch XNet-Plus to fuse multi-frequency representations. The Mean Teacher framework provides consistency training across multiple outputs, aided by loss terms $L_{in}$, $L_{out}$, and $L_{con}$ with weights $\alpha$ and $\beta$ and EMA-based teacher updates. Across 2D and 3D datasets (ACDC, LA, Pancreas-NIH), WT-BCP achieves state-of-the-art or competitive performance, delivering higher Dice and Jaccard scores while improving boundary metrics such as $95HD$ and ASD, especially at low labeling ratios. By leveraging $LF$/$HF$ fusion and multi-output supervision, the approach reduces labeled data requirements and enhances detail preservation, offering a practical path toward robust clinical segmentation.
Abstract
Semi-supervised medical image segmentation (SSMIS) shows promise in reducing reliance on scarce labeled medical data. However, SSMIS field confronts challenges such as distribution mismatches between labeled and unlabeled data, artificial perturbations causing training biases, and inadequate use of raw image information, especially low-frequency (LF) and high-frequency (HF) components.To address these challenges, we propose a Wavelet Transform based Bidirectional Copy-Paste SSMIS framework, named WT-BCP, which improves upon the Mean Teacher approach. Our method enhances unlabeled data understanding by copying random crops between labeled and unlabeled images and employs WT to extract LF and HF details.We propose a multi-input and multi-output model named XNet-Plus, to receive the fused information after WT. Moreover, consistency training among multiple outputs helps to mitigate learning biases introduced by artificial perturbations. During consistency training, the mixed images resulting from WT are fed into both models, with the student model's output being supervised by pseudo-labels and ground-truth. Extensive experiments conducted on 2D and 3D datasets confirm the effectiveness of our model.Code: https://github.com/simzhangbest/WT-BCP.
