Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data

Ivan Diaz; Mario Geiger; Richard Iain McKinley

Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data

Ivan Diaz, Mario Geiger, Richard Iain McKinley

TL;DR

This work tackles pose variability in 3D medical image segmentation by enforcing SE($3$) equivariance in voxel convolutions through learnable steerable kernels based on spherical harmonics. Building an SE($3$)-equivariant Unet with irreducible representations ($l=0,1,2$) and five radial basis kernels, the method achieves robust performance without rotation-based data augmentation and improves parameter and sample efficiency. On MRI brain tumor and Mindboggle101 healthy brain segmentation, the approach yields competitive Dice scores, demonstrates strong rotational robustness, and outperforms group-convolutional baselines while using fewer parameters. The results support the practicality of equivariant networks as drop-in replacements for standard Unets in pose-variant medical imaging tasks, with code available for broader adoption.

Abstract

Convolutional neural networks (CNNs) allow for parameter sharing and translational equivariance by using convolutional kernels in their linear layers. By restricting these kernels to be SO(3)-steerable, CNNs can further improve parameter sharing. These rotationally-equivariant convolutional layers have several advantages over standard convolutional layers, including increased robustness to unseen poses, smaller network size, and improved sample efficiency. Despite this, most segmentation networks used in medical image analysis continue to rely on standard convolutional kernels. In this paper, we present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics. These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training. In addition, we demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks, with enhanced robustness to reduced amounts of training data and improved parameter efficiency. Code to reproduce our results, and to implement the equivariant segmentation networks for other tasks is available at http://github.com/SCAN-NRAD/e3nn_Unet

Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data

TL;DR

This work tackles pose variability in 3D medical image segmentation by enforcing SE(

) equivariance in voxel convolutions through learnable steerable kernels based on spherical harmonics. Building an SE(

)-equivariant Unet with irreducible representations (

) and five radial basis kernels, the method achieves robust performance without rotation-based data augmentation and improves parameter and sample efficiency. On MRI brain tumor and Mindboggle101 healthy brain segmentation, the approach yields competitive Dice scores, demonstrates strong rotational robustness, and outperforms group-convolutional baselines while using fewer parameters. The results support the practicality of equivariant networks as drop-in replacements for standard Unets in pose-variant medical imaging tasks, with code available for broader adoption.

Abstract

Paper Structure (25 sections, 4 equations, 9 figures, 5 tables)

This paper contains 25 sections, 4 equations, 9 figures, 5 tables.

Introduction
Building an equivariant segmentation network
Irreducible Representations
Equivariant voxel convolution
Pooling, upsampling, non-linearities and normalization layers
Related Work
Methods
Model architectures
Irreducible representations
Kernel dimension and radial basis functions
Reference and equivariant Unet architectures.
Datasets and Experiments
Medical Image Decathlon: Brain Tumor segmentation
Mindboggle101 dataset: Healthy appearing brain structure segmentation
Results
...and 10 more sections

Figures (9)

Figure 1: (Left) Our equivariant self-connection convolutional layer for feature extraction: a single irreducible representation is produced by the sum of a convolution on the scalar irreps ($l=0$), a convolution on the vector features ($l=1$) and a convolution on the tensor features ($l=2$), together with a self connection layer (voxel-wise fully connected tensor product between the irreps). (Right) Illustration of the fully connected tensor product in the beginning of our network. The input representations are our scalar image "0e" on the left and the spherical harmonics of $l$ from 0 to 2 on the right, which result in a hidden layer of irreps of scalar, vector and rank-2 tensors.
Figure 2: Four imaging modalities used in the brain tumor segmentation task. The brain tumor can be clearly seen in the top left.
Figure 3: Two cross sections showing the seven brain structures chosen for the healthy-appearing brain structure segmentation task
Figure 4: Dice score on the test set for the brain tumor segmentation task as a function of the number of volumes used for training.
Figure 5: Dice score on the test set vs rotation angle in the axial plane on the brain structure segmentation task.
...and 4 more figures

Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data

TL;DR

Abstract

Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data

Authors

TL;DR

Abstract

Table of Contents

Figures (9)