Table of Contents
Fetching ...

FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks

Quansong He, Xiangde Min, Kaishen Wang, Tao He

TL;DR

The paper tackles the limitation of traditional UNet skip connections, which lack cross‑scale interaction and rely on simple fusion. It introduces FuseUNet, a multi‑scale feature fusion framework that models decoding as an IVP and uses adaptive nmODEs driven by linear multistep methods to fuse features across scales. This approach is encoder‑decoder agnostic and demonstrated to significantly reduce parameters and FLOPs while maintaining segmentation accuracy across 3D and 2D medical imaging datasets. The work connects skip connections to numerical integration theory, offering a rigorous, interpretable foundation for cross‑layer information propagation and highlighting memory‑cost considerations for future improvements.

Abstract

Medical image segmentation is a critical task in computer vision, with UNet serving as a milestone architecture. The typical component of UNet family is the skip connection, however, their skip connections face two significant limitations: (1) they lack effective interaction between features at different scales, and (2) they rely on simple concatenation or addition operations, which constrain efficient information integration. While recent improvements to UNet have focused on enhancing encoder and decoder capabilities, these limitations remain overlooked. To overcome these challenges, we propose a novel multi-scale feature fusion method that reimagines the UNet decoding process as solving an initial value problem (IVP), treating skip connections as discrete nodes. By leveraging principles from the linear multistep method, we propose an adaptive ordinary differential equation method to enable effective multi-scale feature fusion. Our approach is independent of the encoder and decoder architectures, making it adaptable to various U-Net-like networks. Experiments on ACDC, KiTS2023, MSD brain tumor, and ISIC2017/2018 skin lesion segmentation datasets demonstrate improved feature utilization, reduced network parameters, and maintained high performance. The code is available at https://github.com/nayutayuki/FuseUNet.

FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks

TL;DR

The paper tackles the limitation of traditional UNet skip connections, which lack cross‑scale interaction and rely on simple fusion. It introduces FuseUNet, a multi‑scale feature fusion framework that models decoding as an IVP and uses adaptive nmODEs driven by linear multistep methods to fuse features across scales. This approach is encoder‑decoder agnostic and demonstrated to significantly reduce parameters and FLOPs while maintaining segmentation accuracy across 3D and 2D medical imaging datasets. The work connects skip connections to numerical integration theory, offering a rigorous, interpretable foundation for cross‑layer information propagation and highlighting memory‑cost considerations for future improvements.

Abstract

Medical image segmentation is a critical task in computer vision, with UNet serving as a milestone architecture. The typical component of UNet family is the skip connection, however, their skip connections face two significant limitations: (1) they lack effective interaction between features at different scales, and (2) they rely on simple concatenation or addition operations, which constrain efficient information integration. While recent improvements to UNet have focused on enhancing encoder and decoder capabilities, these limitations remain overlooked. To overcome these challenges, we propose a novel multi-scale feature fusion method that reimagines the UNet decoding process as solving an initial value problem (IVP), treating skip connections as discrete nodes. By leveraging principles from the linear multistep method, we propose an adaptive ordinary differential equation method to enable effective multi-scale feature fusion. Our approach is independent of the encoder and decoder architectures, making it adaptable to various U-Net-like networks. Experiments on ACDC, KiTS2023, MSD brain tumor, and ISIC2017/2018 skin lesion segmentation datasets demonstrate improved feature utilization, reduced network parameters, and maintained high performance. The code is available at https://github.com/nayutayuki/FuseUNet.

Paper Structure

This paper contains 20 sections, 9 theorems, 19 equations, 12 figures, 12 tables, 1 algorithm.

Key Result

Theorem 3.1

Linear Multistep Method bashforthmoulton. Given the derivative $\dot{y}(t)=F(t, y(t)), y(t_0) = y_0$, choose a value $\delta$ for the size of every step along t-axis and set $t_{n+i}=t_n+i \cdot \delta$, the result is approximations for the value of $y(t_i) \approx y_i$, multistep methods use inform with $a_s=1$. The coefficients $a_0,\dotsc ,a_{s-1}$ and $b_0,\dotsc ,b_{s}$ determine the method.

Figures (12)

  • Figure 1: The traditional architecture of U-Nets, the skip connections only communicate information at the same scale.
  • Figure 2: (a) The architecture of an $L$-stage U-Net incorporating discrete nmODEs. Here, $P$-$C$ represents the Predictor-Corrector module, with its internal structure detailed in (b). $C$ denotes the calculator used in the final step, which exclusively employs explicit methods, and its internal structure is illustrated in (c). The number of channels in $Y$ is set to twice the number of target classes, while all other dimensions remain consistent with the original input. The internal structure of the nmODEs block is shown in (d), where the function $f$ executes the corresponding operations based on the specific network architecture, such as convolution, Transformer, or Mamba.
  • Figure 3: Visualization on the ACDC
  • Figure 4: Visualization on the KiTS
  • Figure 5: Visualization on the MSD
  • ...and 7 more figures

Theorems & Definitions (9)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5
  • Theorem 1.6
  • Theorem 1.7