Table of Contents
Fetching ...

Lean Unet: A Compact Model for Image Segmentation

Ture Hassler, Ida Åkerholm, Marcus Nordström, Gabriele Balletti, Orcun Goksel

TL;DR

This work tackles the memory and latency challenges of Unet-based medical image segmentation by examining whether pruning improvements stem from channel selection or architectural choices. It analyzes gradual channel pruning (STAMP) and demonstrates that the resulting architecture, particularly a flat, fixed-channel design, is the critical factor. The authors introduce Lean Unet (LUnet), a compact architecture with uniform channel counts across blocks, which achieves competitive performance with markedly fewer parameters (over 30x reduction) and matches or exceeds pruned networks under similar parameter budgets. The findings suggest that a lean, data-agnostic architecture can outperform traditional Unet variants and certain pruning strategies, highlighting the importance of architectural design in efficient segmentation models.

Abstract

Unet and its variations have been standard in semantic image segmentation, especially for computer assisted radiology. Current Unet architectures iteratively downsample spatial resolution while increasing channel dimensions to preserve information content. Such a structure demands a large memory footprint, limiting training batch sizes and increasing inference latency. Channel pruning compresses Unet architecture without accuracy loss, but requires lengthy optimization and may not generalize across tasks and datasets. By investigating Unet pruning, we hypothesize that the final structure is the crucial factor, not the channel selection strategy of pruning. Based on our observations, we propose a lean Unet architecture (LUnet) with a compact, flat hierarchy where channels are not doubled as resolution is halved. We evaluate on a public MRI dataset allowing comparable reporting, as well as on two internal CT datasets. We show that a state-of-the-art pruning solution (STAMP) mainly prunes from the layers with the highest number of channels. Comparatively, simply eliminating a random channel at the pruning-identified layer or at the largest layer achieves similar or better performance. Our proposed LUnet with fixed architectures and over 30 times fewer parameters achieves performance comparable to both conventional Unet counterparts and data-adaptively pruned networks. The proposed lean Unet with constant channel count across layers requires far fewer parameters while achieving performance superior to standard Unet for the same total number of parameters. Skip connections allow Unet bottleneck channels to be largely reduced, unlike standard encoder-decoder architectures requiring increased bottleneck channels for information propagation.

Lean Unet: A Compact Model for Image Segmentation

TL;DR

This work tackles the memory and latency challenges of Unet-based medical image segmentation by examining whether pruning improvements stem from channel selection or architectural choices. It analyzes gradual channel pruning (STAMP) and demonstrates that the resulting architecture, particularly a flat, fixed-channel design, is the critical factor. The authors introduce Lean Unet (LUnet), a compact architecture with uniform channel counts across blocks, which achieves competitive performance with markedly fewer parameters (over 30x reduction) and matches or exceeds pruned networks under similar parameter budgets. The findings suggest that a lean, data-agnostic architecture can outperform traditional Unet variants and certain pruning strategies, highlighting the importance of architectural design in efficient segmentation models.

Abstract

Unet and its variations have been standard in semantic image segmentation, especially for computer assisted radiology. Current Unet architectures iteratively downsample spatial resolution while increasing channel dimensions to preserve information content. Such a structure demands a large memory footprint, limiting training batch sizes and increasing inference latency. Channel pruning compresses Unet architecture without accuracy loss, but requires lengthy optimization and may not generalize across tasks and datasets. By investigating Unet pruning, we hypothesize that the final structure is the crucial factor, not the channel selection strategy of pruning. Based on our observations, we propose a lean Unet architecture (LUnet) with a compact, flat hierarchy where channels are not doubled as resolution is halved. We evaluate on a public MRI dataset allowing comparable reporting, as well as on two internal CT datasets. We show that a state-of-the-art pruning solution (STAMP) mainly prunes from the layers with the highest number of channels. Comparatively, simply eliminating a random channel at the pruning-identified layer or at the largest layer achieves similar or better performance. Our proposed LUnet with fixed architectures and over 30 times fewer parameters achieves performance comparable to both conventional Unet counterparts and data-adaptively pruned networks. The proposed lean Unet with constant channel count across layers requires far fewer parameters while achieving performance superior to standard Unet for the same total number of parameters. Skip connections allow Unet bottleneck channels to be largely reduced, unlike standard encoder-decoder architectures requiring increased bottleneck channels for information propagation.

Paper Structure

This paper contains 14 sections, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Visualization of pruning strategies on two convolution channels with 3x3 filters and 3 channel input: dense (top), unstructured parameter pruning (left), and structured channel pruning (right).
  • Figure 2: Visualization of a regular Unet architecture, with $N_\text{f}$ starting convolution filters, two convolutions per block, and a depth of four levels. LUnet architecture follows the same structure, except that the number of convolution channels per block remains constant across network levels rather than doubling at each successive level.
  • Figure 3: Pruning on HarP for 200$\rightarrow$70 train-test split. (a) Test Dice score evaluated during training with STAMP set compared with the dense Unet baseline (Unet$_{100\%}$). (b) Remaining channels at each Unet encoder-decoder subpart during pruning. (c) Distribution of channels per block at three sample pruning steps when {75,50,25}% of total channels remain.
  • Figure 4: Dice comparisons of strategies. (a) Pruning compared to the baseline dense Unet$_{100\%}$ and linearly-scaled-down models Unet$_{\{75\%,50\%,25\%\}}$ (orange) as well as equivalent-sized pruned networks after retraining from random initializations (green). (b) Pruning a random channel from STAMP-determined block (orange), compared to pruning based on channel activations (blue). Curves show the median values, with min/max ranges of three repeated runs shaded.
  • Figure 5: Sample results on the Tracheal Tree (TT) dataset: (a) Distribution of pruning criterion (normalized activations) per Unet level at 95% channels remaining. (b) Remaining channels at each Unet encoder-decoder subpart during pruning. (c) Pruning compared to the baseline dense and linearly-scaled-down Unet$_{\{100\%,75\%,50\%,25\%\}}$ (orange) as well as equivalent-sized pruned networks after retraining from random initializations (green). (d) Distribution of channels per block at the pruning steps when {75,50,25}% of total channels remain.
  • ...and 2 more figures