Table of Contents
Fetching ...

ProSMA-UNet: Decoder Conditioning for Proximal-Sparse Skip Feature Selection

Chun-Wun Cheng, Yanqi Cheng, Peiyuan Jing, Guang Yang, Javier A. Montoya-Zegarra, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

TL;DR

ProSMA-UNet (Proximal-Sparse Multi-Scale Attention U-Net), which reformulates skip gating as a decoder-conditioned sparse feature selection problem, and incorporates decoder-conditioned channel gating driven by global decoder context.

Abstract

Medical image segmentation commonly relies on U-shaped encoder-decoder architectures such as U-Net, where skip connections preserve fine spatial detail by injecting high-resolution encoder features into the decoder. However, these skip pathways also propagate low-level textures, background clutter, and acquisition noise, allowing irrelevant information to bypass deeper semantic filtering -- an issue that is particularly detrimental in low-contrast clinical imaging. Although attention gates have been introduced to address this limitation, they typically produce dense sigmoid masks that softly reweight features rather than explicitly removing irrelevant activations. We propose ProSMA-UNet (Proximal-Sparse Multi-Scale Attention U-Net), which reformulates skip gating as a decoder-conditioned sparse feature selection problem. ProSMA constructs a multi-scale compatibility field using lightweight depthwise dilated convolutions to capture relevance across local and contextual scales, then enforces explicit sparsity via an $\ell_1$ proximal operator with learnable per-channel thresholds, yielding a closed-form soft-thresholding gate that can remove noisy responses. To further suppress semantically irrelevant channels, ProSMA incorporates decoder-conditioned channel gating driven by global decoder context. Extensive experiments on challenging 2D and 3D benchmarks demonstrate state-of-the-art performance, with particularly large gains ($\approx20$\%) on difficult 3D segmentation tasks. Project page: https://math-ml-x.github.io/ProSMA-UNet/

ProSMA-UNet: Decoder Conditioning for Proximal-Sparse Skip Feature Selection

TL;DR

ProSMA-UNet (Proximal-Sparse Multi-Scale Attention U-Net), which reformulates skip gating as a decoder-conditioned sparse feature selection problem, and incorporates decoder-conditioned channel gating driven by global decoder context.

Abstract

Medical image segmentation commonly relies on U-shaped encoder-decoder architectures such as U-Net, where skip connections preserve fine spatial detail by injecting high-resolution encoder features into the decoder. However, these skip pathways also propagate low-level textures, background clutter, and acquisition noise, allowing irrelevant information to bypass deeper semantic filtering -- an issue that is particularly detrimental in low-contrast clinical imaging. Although attention gates have been introduced to address this limitation, they typically produce dense sigmoid masks that softly reweight features rather than explicitly removing irrelevant activations. We propose ProSMA-UNet (Proximal-Sparse Multi-Scale Attention U-Net), which reformulates skip gating as a decoder-conditioned sparse feature selection problem. ProSMA constructs a multi-scale compatibility field using lightweight depthwise dilated convolutions to capture relevance across local and contextual scales, then enforces explicit sparsity via an proximal operator with learnable per-channel thresholds, yielding a closed-form soft-thresholding gate that can remove noisy responses. To further suppress semantically irrelevant channels, ProSMA incorporates decoder-conditioned channel gating driven by global decoder context. Extensive experiments on challenging 2D and 3D benchmarks demonstrate state-of-the-art performance, with particularly large gains (\%) on difficult 3D segmentation tasks. Project page: https://math-ml-x.github.io/ProSMA-UNet/
Paper Structure (8 sections, 1 theorem, 5 equations, 2 figures, 3 tables)

This paper contains 8 sections, 1 theorem, 5 equations, 2 figures, 3 tables.

Key Result

theorem 1

Let $u \in \mathbb{R}^{H \times W \times C_a}$ be the multi-scale compatibility field defined in Eq, and let $\lambda \in \mathbb{R}_{+}^{C_a}$ be the per-channel sparsity threshold defined in eq:l1_prox. Consider the proximal sparse gating problem $z = \arg\min_{z \in \mathbb{R}^{H \times W \times

Figures (2)

  • Figure 1: ProSMA-UNet motivation and overview. (a) Conventional attention gates generate a dense soft mask (sigmoid reweighting), which can still pass weak but harmful skip activations (noise features) into the decoder. (b) Our ProSMA sparse gating constructs a multi-scale decoder--encoder compatibility field and applies an $\ell_1$ proximal (soft-thresholding) operator to induce explicit sparsity (exact zeros), enabling direct removal of irrelevant skip responses. (c) ProSMA-UNet integrates the proposed sparse gating module at each skip connection to condition high-resolution feature transfer on decoder context. (d) Overview of ProSMA Spare Gating.
  • Figure 2: Qualitative comparison of segmentation masks produced by P-UNET and baseline methods across 2D and 3D medical imaging datasets.

Theorems & Definitions (2)

  • theorem 1: Exact Feature Selection and Stability of Proximal Sparse Gating
  • proof