Table of Contents
Fetching ...

Computing a Characteristic Orientation for Rotation-Independent Image Analysis

Cristian Valero-Abundio, Emilio Sansano-Sansano, Raúl Montoliu, Marina Martínez García

TL;DR

General Intensity Direction is introduced, a preprocessing method that improves rotation robustness without modifying the network architecture, allowing standard models to process inputs more consistently across different rotations, making it compatible with convolutional networks.

Abstract

Handling geometric transformations, particularly rotations, remains a challenge in deep learning for computer vision. Standard neural networks lack inherent rotation invariance and typically rely on data augmentation or architectural modifications to improve robustness. Although effective, these approaches increase computational demands, require specialised implementations, or alter network structures, limiting their applicability. This paper introduces General Intensity Direction (GID), a preprocessing method that improves rotation robustness without modifying the network architecture. The method estimates a global orientation for each image and aligns it to a canonical reference frame, allowing standard models to process inputs more consistently across different rotations. Unlike moment-based approaches that extract invariant descriptors, this method directly transforms the image while preserving spatial structure, making it compatible with convolutional networks. Experimental evaluation on the rotated MNIST dataset shows that the proposed method achieves higher accuracy than state-of-the-art rotation-invariant architectures. Additional experiments on the CIFAR-10 dataset, confirm that the method remains effective under more complex conditions.

Computing a Characteristic Orientation for Rotation-Independent Image Analysis

TL;DR

General Intensity Direction is introduced, a preprocessing method that improves rotation robustness without modifying the network architecture, allowing standard models to process inputs more consistently across different rotations, making it compatible with convolutional networks.

Abstract

Handling geometric transformations, particularly rotations, remains a challenge in deep learning for computer vision. Standard neural networks lack inherent rotation invariance and typically rely on data augmentation or architectural modifications to improve robustness. Although effective, these approaches increase computational demands, require specialised implementations, or alter network structures, limiting their applicability. This paper introduces General Intensity Direction (GID), a preprocessing method that improves rotation robustness without modifying the network architecture. The method estimates a global orientation for each image and aligns it to a canonical reference frame, allowing standard models to process inputs more consistently across different rotations. Unlike moment-based approaches that extract invariant descriptors, this method directly transforms the image while preserving spatial structure, making it compatible with convolutional networks. Experimental evaluation on the rotated MNIST dataset shows that the proposed method achieves higher accuracy than state-of-the-art rotation-invariant architectures. Additional experiments on the CIFAR-10 dataset, confirm that the method remains effective under more complex conditions.
Paper Structure (10 sections, 1 equation, 3 figures, 1 table)

This paper contains 10 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Conv32 architecture diagram.
  • Figure 2: Upper row: mean accuracy (with 10 repetitions) across rotation angles from $0^\circ$ to $359^\circ$ for different model configurations. (a) results on rotated MNIST; (b) results on rotated CIFAR-10. Lower row: (c) mean accuracy (with 10 repetitions) for GID+Conv32 using different interpolation methods, evaluated across all rotation angles in the rotated MNIST dataset. The shaded area represents one standard deviation.
  • Figure 3: Effect of the GID preprocessing method on MNIST (a) and CIFAR-10 (b). In each case, the first row shows original images, the second row shows the same inputs after a random rotation, and the third row shows the result after applying GID. While GID successfully recovers the same orientation in MNIST, its performance is less reliable in CIFAR-10 due to background interference.