Table of Contents
Fetching ...

Enhancing Feature Fusion of U-like Networks with Dynamic Skip Connections

Yue Cao, Quansong He, Kaishen Wang, Jianlong Xiong, Zhang Yi, Tao He

TL;DR

This work targets limitations in skip connections within U-like networks for medical image segmentation, identifying inter-feature and intra-feature constraints that hinder adaptive feature propagation and multi-scale fusion. It introduces the Dynamic Skip Connection (DSC) block, composed of a Dynamic Multi-Scale Kernel (DMSK) module for adaptive multi-scale feature extraction and a Test-Time Training (TTT) module for inference-time adaptation of skip pathways. The DSC block is architecture-agnostic and demonstrated to improve performance across CNN-, Transformer-, hybrid CNN-Transformer, and Mamba-based backbones on five diverse datasets, with ablations confirming the complementary benefits of DMSK and TTT and the advantage of multi-scale kernel combinations. While offering consistent gains, the approach incurs additional computation during inference due to dynamic adaptation, suggesting future work on efficient implementations. Overall, the DSC block provides a practical, plug-and-play enhancement for robust and accurate medical image segmentation in varied clinical contexts.

Abstract

U-like networks have become fundamental frameworks in medical image segmentation through skip connections that bridge high-level semantics and low-level spatial details. Despite their success, conventional skip connections exhibit two key limitations: inter-feature constraints and intra-feature constraints. The inter-feature constraint refers to the static nature of feature fusion in traditional skip connections, where information is transmitted along fixed pathways regardless of feature content. The intra-feature constraint arises from the insufficient modeling of multi-scale feature interactions, thereby hindering the effective aggregation of global contextual information. To overcome these limitations, we propose a novel Dynamic Skip Connection (DSC) block that fundamentally enhances cross-layer connectivity through adaptive mechanisms. The DSC block integrates two complementary components. (1) Test-Time Training (TTT) module. This module addresses the inter-feature constraint by enabling dynamic adaptation of hidden representations during inference, facilitating content-aware feature refinement. (2) Dynamic Multi-Scale Kernel (DMSK) module. To mitigate the intra-feature constraint, this module adaptively selects kernel sizes based on global contextual cues, enhancing the network capacity for multi-scale feature integration. The DSC block is architecture-agnostic and can be seamlessly incorporated into existing U-like network structures. Extensive experiments demonstrate the plug-and-play effectiveness of the proposed DSC block across CNN-based, Transformer-based, hybrid CNN-Transformer, and Mamba-based U-like networks.

Enhancing Feature Fusion of U-like Networks with Dynamic Skip Connections

TL;DR

This work targets limitations in skip connections within U-like networks for medical image segmentation, identifying inter-feature and intra-feature constraints that hinder adaptive feature propagation and multi-scale fusion. It introduces the Dynamic Skip Connection (DSC) block, composed of a Dynamic Multi-Scale Kernel (DMSK) module for adaptive multi-scale feature extraction and a Test-Time Training (TTT) module for inference-time adaptation of skip pathways. The DSC block is architecture-agnostic and demonstrated to improve performance across CNN-, Transformer-, hybrid CNN-Transformer, and Mamba-based backbones on five diverse datasets, with ablations confirming the complementary benefits of DMSK and TTT and the advantage of multi-scale kernel combinations. While offering consistent gains, the approach incurs additional computation during inference due to dynamic adaptation, suggesting future work on efficient implementations. Overall, the DSC block provides a practical, plug-and-play enhancement for robust and accurate medical image segmentation in varied clinical contexts.

Abstract

U-like networks have become fundamental frameworks in medical image segmentation through skip connections that bridge high-level semantics and low-level spatial details. Despite their success, conventional skip connections exhibit two key limitations: inter-feature constraints and intra-feature constraints. The inter-feature constraint refers to the static nature of feature fusion in traditional skip connections, where information is transmitted along fixed pathways regardless of feature content. The intra-feature constraint arises from the insufficient modeling of multi-scale feature interactions, thereby hindering the effective aggregation of global contextual information. To overcome these limitations, we propose a novel Dynamic Skip Connection (DSC) block that fundamentally enhances cross-layer connectivity through adaptive mechanisms. The DSC block integrates two complementary components. (1) Test-Time Training (TTT) module. This module addresses the inter-feature constraint by enabling dynamic adaptation of hidden representations during inference, facilitating content-aware feature refinement. (2) Dynamic Multi-Scale Kernel (DMSK) module. To mitigate the intra-feature constraint, this module adaptively selects kernel sizes based on global contextual cues, enhancing the network capacity for multi-scale feature integration. The DSC block is architecture-agnostic and can be seamlessly incorporated into existing U-like network structures. Extensive experiments demonstrate the plug-and-play effectiveness of the proposed DSC block across CNN-based, Transformer-based, hybrid CNN-Transformer, and Mamba-based U-like networks.

Paper Structure

This paper contains 27 sections, 14 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: (a) illustrates the encoder-decoder framework incorporating Dynamic Skip Connection (DSC) blocks where the DSC block of $l$-th layer is detailed in (b). The left component (b.L) represents the Dynamic Multi-Scale Kernel (DMSK) module and the right component (b.R) represents the Test-Time Training (TTT) module. Input features representing skin lesions pass through successive U-Net layers as $x^1_{in}, x^2_{in}, \dots, x^n_{in}$. The DMSK module adaptively selects kernel sizes based on global context to capture multi-scale features, while the TTT module dynamically adjusts connection weights through test-time training for adaptive feature propagation to the corresponding decoder layers.
  • Figure 2: Visualized segmentation examples of cell segmentation in microscopy images (1st row), MedNext network + DSC block (2nd row), instruments segmentation in endoscopy images (3rd row), and U-Mamba network + DSC block (4th row).
  • Figure 3: Visualized segmentation examples of abdominal organ segmentation in CT(a) and MRI(b).