Table of Contents
Fetching ...

ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3D

Dmitrii Zhemchuzhnikov, Sergei Grudinin

TL;DR

ILPO-Net addresses rotational variance in 3D volumetric pattern recognition by introducing a convolution that is invariant to local pattern orientation through a Wigner-matrix-based formulation and orientation pooling. The method expands learnable filters in spherical-harmonic components, performs 3D convolution in rotated frames, reconstructs the response in rotation space, and pools over orientations to achieve invariance without sacrificing expressiveness. Empirically, ILPO-Net delivers state-of-the-art performance on CATH and MedMNIST 3D datasets while dramatically reducing parameter counts, and filter visualizations confirm the learning of diverse, arbitrary-shaped patterns. The approach offers a principled, efficient alternative to data augmentation and broadens the applicability of robust 3D pattern recognition across disciplines.

Abstract

Effective recognition of spatial patterns and learning their hierarchy is crucial in modern spatial data analysis. Volumetric data applications seek techniques ensuring invariance not only to shifts but also to pattern rotations. While traditional methods can readily achieve translational invariance, rotational invariance possesses multiple challenges and remains an active area of research. Here, we present ILPO-Net (Invariant to Local Patterns Orientation Network), a novel approach that handles arbitrarily shaped patterns with the convolutional operation inherently invariant to local spatial pattern orientations using the Wigner matrix expansions. Our architecture seamlessly integrates the new convolution operator and, when benchmarked on diverse volumetric datasets such as MedMNIST and CATH, demonstrates superior performance over the baselines with significantly reduced parameter counts - up to 1000 times fewer in the case of MedMNIST. Beyond these demonstrations, ILPO-Net's rotational invariance paves the way for other applications across multiple disciplines. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/ILPONet.

ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3D

TL;DR

ILPO-Net addresses rotational variance in 3D volumetric pattern recognition by introducing a convolution that is invariant to local pattern orientation through a Wigner-matrix-based formulation and orientation pooling. The method expands learnable filters in spherical-harmonic components, performs 3D convolution in rotated frames, reconstructs the response in rotation space, and pools over orientations to achieve invariance without sacrificing expressiveness. Empirically, ILPO-Net delivers state-of-the-art performance on CATH and MedMNIST 3D datasets while dramatically reducing parameter counts, and filter visualizations confirm the learning of diverse, arbitrary-shaped patterns. The approach offers a principled, efficient alternative to data augmentation and broadens the applicability of robust 3D pattern recognition across disciplines.

Abstract

Effective recognition of spatial patterns and learning their hierarchy is crucial in modern spatial data analysis. Volumetric data applications seek techniques ensuring invariance not only to shifts but also to pattern rotations. While traditional methods can readily achieve translational invariance, rotational invariance possesses multiple challenges and remains an active area of research. Here, we present ILPO-Net (Invariant to Local Patterns Orientation Network), a novel approach that handles arbitrarily shaped patterns with the convolutional operation inherently invariant to local spatial pattern orientations using the Wigner matrix expansions. Our architecture seamlessly integrates the new convolution operator and, when benchmarked on diverse volumetric datasets such as MedMNIST and CATH, demonstrates superior performance over the baselines with significantly reduced parameter counts - up to 1000 times fewer in the case of MedMNIST. Beyond these demonstrations, ILPO-Net's rotational invariance paves the way for other applications across multiple disciplines. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/ILPONet.
Paper Structure (18 sections, 4 theorems, 53 equations, 4 figures, 2 tables)

This paper contains 18 sections, 4 theorems, 53 equations, 4 figures, 2 tables.

Key Result

Lemma D.1

Let $Y_l^k(\theta, \phi)$ be the spherical harmonic function of degree $l$ and order $k$. Then, the Lipschitz constant $L$ of $Y_l^k(\theta, \phi)$ is bounded by:

Figures (4)

  • Figure 1: Schematic illustration of the ILPO convolution. The diagram showcases the main steps involved in our convolution process: 1) Tensor product of trainable filter coefficients and spherical harmonics; 2) 3D convolution of the input image and the rotated filter coefficients; 3) Reconstruction of the convolution output in the SO(3) space; 4) Orientation (soft)-max pooling.
  • Figure 2: Standard deviation of sampled maxima relative to the true function maximum ($y$-axis) as a function of sampling size $K$ in the SO(3) space ($x$-axis).
  • Figure 3: Visualization of filters from the 1st ILPO layer of ILPONet-50. Each column corresponds to different output channels, with rows indicating different radii and input channels. Given that the first ILPO layer only has one input channel, only three projections (radii) are shown in each column. $x$ and $y$ axes correspond to the azimuthal and polar angles, correspondingly. The filters' values are shown in the Mercator projection. The red color corresponds to the positive values, and the blue color to the negative ones.
  • Figure 4: Visualization of filters from the last, 17th ILFO layer of ILFONet-50. Each column in the illustration represents a triplet corresponding to three different radii in the filter. Different triplets relate to different input channels, reflecting the complexity and feature extraction capabilities of deeper layers in the network. $x$ and $y$ axes correspond to the azimuthal and polar angles, correspondingly. The filters' values are shown in the Mercator projection. The red color corresponds to the positive values, and the blue color to the negative ones.

Theorems & Definitions (8)

  • Lemma D.1
  • proof
  • Theorem D.2
  • proof
  • Theorem D.3
  • proof
  • Theorem D.4
  • proof