Table of Contents
Fetching ...

SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

Wenhui Zhu, Xiwen Chen, Peijie Qiu, Mohammad Farazi, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

TL;DR

The paper addresses the issue that UNet-based medical image segmentation can suffer from asymmetric supervision and feature redundancy, which degrade semantic fidelity. It analyzes standard UNet and SwinUnet using Grad‑CAM and feature-map similarity to reveal that the decoder receives stronger semantic guidance than the encoder, leading to potential semantic loss. To remedy this, it introduces semantic consistency regularization ($\mathcal{L}_{SCR}$) and internal feature distillation ($\mathcal{L}_{IFD}$), enabling the final decoder feature map $F_{final}$ to supervise earlier blocks and distill information across channel hierarchies, integrated with the base loss $\mathcal{L}_{cd}$. Experiments on four datasets (Synapse, ACDC, GlaS, MoNuSeg) show consistent improvements for both Unet and SwinUnet with minimal overhead, demonstrating the method's practical value for robust medical segmentation.

Abstract

Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important factors that potentially affect its performance: (i) irrelative feature learned caused by asymmetric supervision; (ii) feature redundancy in the feature map. To this end, we propose to balance the supervision between encoder and decoder and reduce the redundant information in the UNet. Specifically, we use the feature map that contains the most semantic information (i.e., the last layer of the decoder) to provide additional supervision to other blocks to provide additional supervision and reduce feature redundancy by leveraging feature distillation. The proposed method can be easily integrated into existing UNet architecture in a plug-and-play fashion with negligible computational cost. The experimental results suggest that the proposed method consistently improves the performance of standard UNets on four medical image segmentation datasets. The code is available at \url{https://github.com/ChongQingNoSubway/SelfReg-UNet}

SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

TL;DR

The paper addresses the issue that UNet-based medical image segmentation can suffer from asymmetric supervision and feature redundancy, which degrade semantic fidelity. It analyzes standard UNet and SwinUnet using Grad‑CAM and feature-map similarity to reveal that the decoder receives stronger semantic guidance than the encoder, leading to potential semantic loss. To remedy this, it introduces semantic consistency regularization () and internal feature distillation (), enabling the final decoder feature map to supervise earlier blocks and distill information across channel hierarchies, integrated with the base loss . Experiments on four datasets (Synapse, ACDC, GlaS, MoNuSeg) show consistent improvements for both Unet and SwinUnet with minimal overhead, demonstrating the method's practical value for robust medical segmentation.

Abstract

Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important factors that potentially affect its performance: (i) irrelative feature learned caused by asymmetric supervision; (ii) feature redundancy in the feature map. To this end, we propose to balance the supervision between encoder and decoder and reduce the redundant information in the UNet. Specifically, we use the feature map that contains the most semantic information (i.e., the last layer of the decoder) to provide additional supervision to other blocks to provide additional supervision and reduce feature redundancy by leveraging feature distillation. The proposed method can be easily integrated into existing UNet architecture in a plug-and-play fashion with negligible computational cost. The experimental results suggest that the proposed method consistently improves the performance of standard UNets on four medical image segmentation datasets. The code is available at \url{https://github.com/ChongQingNoSubway/SelfReg-UNet}
Paper Structure (11 sections, 3 equations, 5 figures, 3 tables)

This paper contains 11 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a)Unet structure. (b) The attention map in Vit/CNN-based UNet corresponds to each encoder and decoder. (For more examples, refer to supplementary Appendix A) (c) ViT/CNN-based Unet feature similarity matrix between shallow (Left) and deeper channel (Right).
  • Figure 2: Demostrating the operation based on feature for (a) semantic consistency regularization and (b) internal feature distillation.
  • Figure 3: Comparison of segmentation performance in Synapse dataset.
  • Figure 4: Comparison of segmentation performance in Glas and MoNuSeg dataset.
  • Figure 5: Ablation study for balance parameters $\lambda$ and loss based on Glas and MoNuSeg.