Table of Contents
Fetching ...

Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition

Tobias Weber, Jakob Dexl, David Rügamer, Michael Ingrisch

TL;DR

The paper tackles the computational barrier of deploying 3D medical image segmentation models by introducing a post-training Tucker decomposition to compress the TotalSegmentator network. It demonstrates that replacing 3D convolution kernels with a Tucker-based three-convolution sequence yields substantial parameter and FLOP reductions (up to ~88% parameters) while preserving segmentation accuracy for many structures after fine-tuning, especially on less powerful hardware where speedups are more pronounced. The approach is evaluated across multiple model resolutions (1.5mm and 3mm), with a broad examination of downsampling factors, and is contrasted against simple pruning baselines. The work highlights the practical potential of tensor-factorization-based compression to democratize access to advanced medical image analysis in clinical settings, while acknowledging hardware-dependent gains and avenues for future improvements such as layer-specific rank selection and broader architectural generalization.

Abstract

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model's parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.

Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition

TL;DR

The paper tackles the computational barrier of deploying 3D medical image segmentation models by introducing a post-training Tucker decomposition to compress the TotalSegmentator network. It demonstrates that replacing 3D convolution kernels with a Tucker-based three-convolution sequence yields substantial parameter and FLOP reductions (up to ~88% parameters) while preserving segmentation accuracy for many structures after fine-tuning, especially on less powerful hardware where speedups are more pronounced. The approach is evaluated across multiple model resolutions (1.5mm and 3mm), with a broad examination of downsampling factors, and is contrasted against simple pruning baselines. The work highlights the practical potential of tensor-factorization-based compression to democratize access to advanced medical image analysis in clinical settings, while acknowledging hardware-dependent gains and avenues for future improvements such as layer-specific rank selection and broader architectural generalization.

Abstract

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model's parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.
Paper Structure (32 sections, 6 equations, 10 figures, 11 tables)

This paper contains 32 sections, 6 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Schematic overview of the Tucker-decomposed convolution operation. The top row shows the original convolution with a $K \times K \times K$ kernel. The Tucker-decomposed convolution (bottom row) achieves its efficiency by first projecting each voxel of the input tensor into a space with a substantially smaller amount of channels using a $1 \times 1 \times 1$ kernel convolution and then performing the (otherwise) costly spatial convolution in this reduced representation space. Subsequently, the tensor is projected back into the original output channel domain. Note that the spatial dimensions $H \times W \times D$ are represented by a single dimension for visual purposes.
  • Figure 2: Dice score aggregated over all classes for the TS test set using the 1.5mm (left) and 3mm (right) TS models. The performance of the original TS model is compared against the Tucker decomposition-based approach (red) and filter pruning (blue). Both compression methods are evaluated with (solid line) and without (dashed line) additional fine-tuning. Error bars represent the standard deviation across different classes.
  • Figure 3: Visualization of segmentation performance across different compression methods (columns) applied to an abdominal CT image. The rows show the achieved compression ratios, which were determined by dividing the original model size by that of the compressed model size. The segmented classes include spleen (green), right kidney (pink), left kidney (orange), gallbladder (purple), and liver (brown). The ground truth (column one) remains constant across all evaluated compression ratios and acts as a benchmark for comparison. Columns two and three demonstrate that Tucker compression achieved noteworthy segmentation performance even for high CRs. Zero-shot Tucker compression introduced artifacts at higher CRs, a limitation that was not observed with fine-tuned Tucker compression. In contrast, the segmentation performances of both pruning approaches deteriorated rapidly (columns four and five).
  • Figure 4: Visualization of segmentation accuracy across different compression methods (columns) applied to a thoracic CT image. The rows show the achieved compression ratios, which were determined by dividing the original model size by that of the compressed model size. The segmented classes include lung_upper_lobe_left (green), lung_lower_lobe_left (pink), lung_upper_lobe_right (orange), lung_middle_lobe_right (purple), lung_lower_lobe_right (brown). The ground truth (column one) remains constant across all evaluated compression ratios and acts as a benchmark for comparison. Columns two and three demonstrate that Tucker compression achieved noteworthy segmentation performance even for high CRs. Zero-shot Tucker compression introduced artifacts at higher CRs, a limitation that was not observed with fine-tuned Tucker compression. In contrast, the segmentation performances of both pruning approaches deteriorated rapidly (columns four and five). Notably, all models failed to segment the pathology in lung_lower_lobe_left accurately - this is a general property of the TS package, which is trained to segment normal anatomy.
  • Figure 5: Difference in achieved Dice scores between the compressed and original model for each segmentation group (colors) across different downsampling factors (x-axis) and different models (subplots).
  • ...and 5 more figures