U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation

Yaopeng Peng; Milan Sonka; Danny Z. Chen

U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation

Yaopeng Peng, Milan Sonka, Danny Z. Chen

TL;DR

U-Net v2 introduces a semantics and detail infusion (SDI) module that enriches each encoder level by incorporating higher-level semantic information and finer details through a Hadamard-product fusion across multi-level features. The SDI workflow includes CBAM-style spatial and channel attention, a channel-reducing 1×1 convolution, and cross-level feature resizing and smoothing before fusion, enabling efficient integration into any encoder-decoder architecture. Evaluations on ISIC skin lesion and polyp segmentation datasets demonstrate consistent performance gains over state-of-the-art methods while maintaining favorable computational efficiency and memory usage. The approach offers a lightweight, end-to-end compatible enhancement to skip connections that improves boundary accuracy and detail preservation in medical image segmentation.

Abstract

In this paper, we introduce U-Net v2, a new robust and efficient U-Net variant for medical image segmentation. It aims to augment the infusion of semantic information into low-level features while simultaneously refining high-level features with finer details. For an input image, we begin by extracting multi-level features with a deep neural network encoder. Next, we enhance the feature map of each level by infusing semantic information from higher-level features and integrating finer details from lower-level features through Hadamard product. Our novel skip connections empower features of all the levels with enriched semantic characteristics and intricate details. The improved features are subsequently transmitted to the decoder for further processing and segmentation. Our method can be seamlessly integrated into any Encoder-Decoder network. We evaluate our method on several public medical image segmentation datasets for skin lesion segmentation and polyp segmentation, and the experimental results demonstrate the segmentation accuracy of our new method over state-of-the-art methods, while preserving memory and computational efficiency. Code is available at: https://github.com/yaoppeng/U-Net_v2

U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 2 figures, 4 tables)

This paper contains 12 sections, 4 equations, 2 figures, 4 tables.

Introduction
Method
Overall Architecture
Semantics and Detail Infusion (SDI) Module
Experiments
Datasets
Experimental Setup
Results and Analysis
Ablation Study
Qualitative Results
Computation, GPU Memory, and Inference Time
Conclusions

Figures (2)

Figure 1: (a) The overall architecture of our U-Net v2 model, which consists of an Encoder, the SDI (semantics and detail infusion) module, and a Decoder. (b) The architecture of the SDI module. For simplicity, we only show the refinement of the third level features ($l=3$). SmoothConv denotes a $3\times 3$ convolution for feature smoothing. $\bigotimes$ denotes the Hadamard product.
Figure 2: Example segmentations from ISIC 2017 dataset. We use PVT as the encoder for U-Net and UNet++.

U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation

TL;DR

Abstract

U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)