Table of Contents
Fetching ...

On the effectiveness of Rotation-Equivariance in U-Net: A Benchmark for Image Segmentation

Robin Ghyselinck, Valentin Delchevalerie, Bruno Dumas, Benoît Frénay

TL;DR

The paper investigates rotation-equivariant U-Nets for image segmentation by benchmarking vanilla and $E(2)$-equivariant variants across five datasets with varying orientation properties and scales. Using $C_4$, $C_8$, and $D_4$ groups within a consistent U-Net framework, it assesses both performance and training-resource costs, revealing that equivariance yields clear benefits on orientation-variant tasks (e.g., Kvasir-SEG, URDE) but can underperform on rotationally symmetric or general datasets (e.g., NucleiSeg, iSAID). The results highlight a nuanced trade-off: equivariant models can match or exceed vanilla performance on some datasets, especially with larger models and data, yet often require more compute and may not justify the gains in all contexts. The study suggests future work toward hybrid architectures that jointly learn equivariant and non-equivariant features to exploit complementary information for diverse segmentation challenges.

Abstract

Numerous studies have recently focused on incorporating different variations of equivariance in Convolutional Neural Networks (CNNs). In particular, rotation-equivariance has gathered significant attention due to its relevance in many applications related to medical imaging, microscopic imaging, satellite imaging, industrial tasks, etc. While prior research has primarily focused on enhancing classification tasks with rotation equivariant CNNs, their impact on more complex architectures, such as U-Net for image segmentation, remains scarcely explored. Indeed, previous work interested in integrating rotation-equivariance into U-Net architecture have focused on solving specific applications with a limited scope. In contrast, this paper aims to provide a more exhaustive evaluation of rotation equivariant U-Net for image segmentation across a broader range of tasks. We benchmark their effectiveness against standard U-Net architectures, assessing improvements in terms of performance and sustainability (i.e., computational cost). Our evaluation focuses on datasets whose orientation of objects of interest is arbitrary in the image (e.g., Kvasir-SEG), but also on more standard segmentation datasets (such as COCO-Stuff) as to explore the wider applicability of rotation equivariance beyond tasks undoubtedly concerned by rotation equivariance. The main contribution of this work is to provide insights into the trade-offs and advantages of integrating rotation equivariance for segmentation tasks.

On the effectiveness of Rotation-Equivariance in U-Net: A Benchmark for Image Segmentation

TL;DR

The paper investigates rotation-equivariant U-Nets for image segmentation by benchmarking vanilla and -equivariant variants across five datasets with varying orientation properties and scales. Using , , and groups within a consistent U-Net framework, it assesses both performance and training-resource costs, revealing that equivariance yields clear benefits on orientation-variant tasks (e.g., Kvasir-SEG, URDE) but can underperform on rotationally symmetric or general datasets (e.g., NucleiSeg, iSAID). The results highlight a nuanced trade-off: equivariant models can match or exceed vanilla performance on some datasets, especially with larger models and data, yet often require more compute and may not justify the gains in all contexts. The study suggests future work toward hybrid architectures that jointly learn equivariant and non-equivariant features to exploit complementary information for diverse segmentation challenges.

Abstract

Numerous studies have recently focused on incorporating different variations of equivariance in Convolutional Neural Networks (CNNs). In particular, rotation-equivariance has gathered significant attention due to its relevance in many applications related to medical imaging, microscopic imaging, satellite imaging, industrial tasks, etc. While prior research has primarily focused on enhancing classification tasks with rotation equivariant CNNs, their impact on more complex architectures, such as U-Net for image segmentation, remains scarcely explored. Indeed, previous work interested in integrating rotation-equivariance into U-Net architecture have focused on solving specific applications with a limited scope. In contrast, this paper aims to provide a more exhaustive evaluation of rotation equivariant U-Net for image segmentation across a broader range of tasks. We benchmark their effectiveness against standard U-Net architectures, assessing improvements in terms of performance and sustainability (i.e., computational cost). Our evaluation focuses on datasets whose orientation of objects of interest is arbitrary in the image (e.g., Kvasir-SEG), but also on more standard segmentation datasets (such as COCO-Stuff) as to explore the wider applicability of rotation equivariance beyond tasks undoubtedly concerned by rotation equivariance. The main contribution of this work is to provide insights into the trade-offs and advantages of integrating rotation equivariance for segmentation tasks.

Paper Structure

This paper contains 34 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: The U-Net architecture for the large vanilla U-Net (adapted from UNet). It illustrates the encoder, bottleneck, and decoder paths with skip connections. Input image is preprocessed by two convolutional layers (not represented in this illustration). Red arrows correspond to downsampling, blue arrows to convolutions and green arrows to upsampling. The final image obtained after the last black arrow are the $c$ segmentation masks for the $c$ corresponding classes.
  • Figure 2: Example of colonoscopy images with their segmentation mask from the Kvasir-SEG dataset. One can notice that the polyps vary in shape and size, and are located at different positions within the images.
  • Figure 3: Example of images with their segmentation mask from the NucleiSeg dataset. Notice the multiplicity of nuclei in single patched images.
  • Figure 4: Example of images with their segmentation mask from the URDE dataset. One can notice that the masks have a cloudy, irregular structure.
  • Figure 5: Example of images with their segmentation mask from the COCO-Stuff dataset. Those images illustrate the fact that orientation may matter. For instance, tennis players are expected to be standing up.
  • ...and 6 more figures