Table of Contents
Fetching ...

PreCM: The Padding-based Rotation Equivariant Convolution Mode for Semantic Segmentation

Xinyu Xu, Huazhen Liu, Tao Zhang, Huilin Xiong, Wenxian Yu

TL;DR

This paper tackles rotation sensitivity in semantic segmentation by introducing PreCM, a padding-based rotation-equivariant convolution mode, accompanied by a rotation-equivariant convolution-group framework. It mathematically formulates orientation handling with a four-orientation group and demonstrates how padding-based convolutions can enforce equivariance across arbitrary input sizes and kernel types, enabling PreCM to replace standard convolutions in existing networks. The approach is validated on three datasets ( Satellite Images of Water Bodies, DRIVE, Floodnet ) and six networks, showing IOU gains and RD reductions under rotated inputs, outperforming data augmentation and several rotation-equivariant baselines. The work offers a practical, plug-and-play solution with broad applicability to multi-scale features and multiple convolution patterns, potentially improving robustness and efficiency in real-world segmentation tasks, while acknowledging that true arbitrary-angle equivariance remains an open challenge.

Abstract

Semantic segmentation is an important branch of image processing and computer vision. With the popularity of deep learning, various convolutional neural networks have been proposed for pixel-level classification and segmentation tasks. In practical scenarios, however, imaging angles are often arbitrary, encompassing instances such as water body images from remote sensing and capillary and polyp images in the medical domain, where prior orientation information is typically unavailable to guide these networks to extract more effective features. In this case, learning features from objects with diverse orientation information poses a significant challenge, as the majority of CNN-based semantic segmentation networks lack rotation equivariance to resist the disturbance from orientation information. To address this challenge, this paper first constructs a universal convolution-group framework aimed at more fully utilizing orientation information and equipping the network with rotation equivariance. Subsequently, we mathematically design a padding-based rotation equivariant convolution mode (PreCM), which is not only applicable to multi-scale images and convolutional kernels but can also serve as a replacement component for various types of convolutions, such as dilated convolutions, transposed convolutions, and asymmetric convolution. To quantitatively assess the impact of image rotation in semantic segmentation tasks, we also propose a new evaluation metric, Rotation Difference (RD). The replacement experiments related to six existing semantic segmentation networks on three datasets show that, the average Intersection Over Union (IOU) of their PreCM-based versions respectively improve 6.91%, 10.63%, 4.53%, 5.93%, 7.48%, 8.33% compared to their original versions in terms of random angle rotation. And the average RD values are decreased by 3.58%, 4.56%, 3.47%, 3.66%, 3.47%, 3.43% respectively.

PreCM: The Padding-based Rotation Equivariant Convolution Mode for Semantic Segmentation

TL;DR

This paper tackles rotation sensitivity in semantic segmentation by introducing PreCM, a padding-based rotation-equivariant convolution mode, accompanied by a rotation-equivariant convolution-group framework. It mathematically formulates orientation handling with a four-orientation group and demonstrates how padding-based convolutions can enforce equivariance across arbitrary input sizes and kernel types, enabling PreCM to replace standard convolutions in existing networks. The approach is validated on three datasets ( Satellite Images of Water Bodies, DRIVE, Floodnet ) and six networks, showing IOU gains and RD reductions under rotated inputs, outperforming data augmentation and several rotation-equivariant baselines. The work offers a practical, plug-and-play solution with broad applicability to multi-scale features and multiple convolution patterns, potentially improving robustness and efficiency in real-world segmentation tasks, while acknowledging that true arbitrary-angle equivariance remains an open challenge.

Abstract

Semantic segmentation is an important branch of image processing and computer vision. With the popularity of deep learning, various convolutional neural networks have been proposed for pixel-level classification and segmentation tasks. In practical scenarios, however, imaging angles are often arbitrary, encompassing instances such as water body images from remote sensing and capillary and polyp images in the medical domain, where prior orientation information is typically unavailable to guide these networks to extract more effective features. In this case, learning features from objects with diverse orientation information poses a significant challenge, as the majority of CNN-based semantic segmentation networks lack rotation equivariance to resist the disturbance from orientation information. To address this challenge, this paper first constructs a universal convolution-group framework aimed at more fully utilizing orientation information and equipping the network with rotation equivariance. Subsequently, we mathematically design a padding-based rotation equivariant convolution mode (PreCM), which is not only applicable to multi-scale images and convolutional kernels but can also serve as a replacement component for various types of convolutions, such as dilated convolutions, transposed convolutions, and asymmetric convolution. To quantitatively assess the impact of image rotation in semantic segmentation tasks, we also propose a new evaluation metric, Rotation Difference (RD). The replacement experiments related to six existing semantic segmentation networks on three datasets show that, the average Intersection Over Union (IOU) of their PreCM-based versions respectively improve 6.91%, 10.63%, 4.53%, 5.93%, 7.48%, 8.33% compared to their original versions in terms of random angle rotation. And the average RD values are decreased by 3.58%, 4.56%, 3.47%, 3.66%, 3.47%, 3.43% respectively.

Paper Structure

This paper contains 17 sections, 26 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: An example of a network that has rotation equivariance.
  • Figure 2: An example of the limitation of convolution distributive law.
  • Figure 3: An example of padding-based convolution process.
  • Figure 4: An example of convolution in flatten form.
  • Figure 5: An example of the rotation equivariance of PreCM.
  • ...and 7 more figures