Table of Contents
Fetching ...

A General Framework for Robust G-Invariance in G-Equivariant Networks

Sophia Sanborn, Nina Miolane

TL;DR

This paper addresses the lossiness of traditional pooling in group-equivariant CNNs by introducing the $G$-Triple-Correlation ($G$-TC) layer, a complete, third-order invariant that preserves all signal structure except for the group action. Grounded in the triple-correlation theory on groups, the $G$-TC provides selective, robust $G$-invariance and is the lowest-degree invariant that is complete, guaranteeing uniqueness up to a basis change. The authors develop efficient, discretized implementations for a range of groups (including commutative and non-commutative ones) and demonstrate improved classification performance and robustness against invariance-based attacks on $G$-MNIST and $G$-ModelNet10 datasets compared to Max $G$-Pooling. They also discuss computational savings via symmetries and bispectral reductions, outlining promising directions for further reducing complexity while preserving the theoretical guarantees of completeness. Overall, the work redefines foundational pooling primitives in geometric deep learning and offers a principled path to robust, exact group invariance in $G$-CNNs with practical applicability to diverse symmetry groups.

Abstract

We introduce a general method for achieving robust group-invariance in group-equivariant convolutional neural networks ($G$-CNNs), which we call the $G$-triple-correlation ($G$-TC) layer. The approach leverages the theory of the triple-correlation on groups, which is the unique, lowest-degree polynomial invariant map that is also complete. Many commonly used invariant maps--such as the max--are incomplete: they remove both group and signal structure. A complete invariant, by contrast, removes only the variation due to the actions of the group, while preserving all information about the structure of the signal. The completeness of the triple correlation endows the $G$-TC layer with strong robustness, which can be observed in its resistance to invariance-based adversarial attacks. In addition, we observe that it yields measurable improvements in classification accuracy over standard Max $G$-Pooling in $G$-CNN architectures. We provide a general and efficient implementation of the method for any discretized group, which requires only a table defining the group's product structure. We demonstrate the benefits of this method for $G$-CNNs defined on both commutative and non-commutative groups--$SO(2)$, $O(2)$, $SO(3)$, and $O(3)$ (discretized as the cyclic $C8$, dihedral $D16$, chiral octahedral $O$ and full octahedral $O_h$ groups)--acting on $\mathbb{R}^2$ and $\mathbb{R}^3$ on both $G$-MNIST and $G$-ModelNet10 datasets.

A General Framework for Robust G-Invariance in G-Equivariant Networks

TL;DR

This paper addresses the lossiness of traditional pooling in group-equivariant CNNs by introducing the -Triple-Correlation (-TC) layer, a complete, third-order invariant that preserves all signal structure except for the group action. Grounded in the triple-correlation theory on groups, the -TC provides selective, robust -invariance and is the lowest-degree invariant that is complete, guaranteeing uniqueness up to a basis change. The authors develop efficient, discretized implementations for a range of groups (including commutative and non-commutative ones) and demonstrate improved classification performance and robustness against invariance-based attacks on -MNIST and -ModelNet10 datasets compared to Max -Pooling. They also discuss computational savings via symmetries and bispectral reductions, outlining promising directions for further reducing complexity while preserving the theoretical guarantees of completeness. Overall, the work redefines foundational pooling primitives in geometric deep learning and offers a principled path to robust, exact group invariance in -CNNs with practical applicability to diverse symmetry groups.

Abstract

We introduce a general method for achieving robust group-invariance in group-equivariant convolutional neural networks (-CNNs), which we call the -triple-correlation (-TC) layer. The approach leverages the theory of the triple-correlation on groups, which is the unique, lowest-degree polynomial invariant map that is also complete. Many commonly used invariant maps--such as the max--are incomplete: they remove both group and signal structure. A complete invariant, by contrast, removes only the variation due to the actions of the group, while preserving all information about the structure of the signal. The completeness of the triple correlation endows the -TC layer with strong robustness, which can be observed in its resistance to invariance-based adversarial attacks. In addition, we observe that it yields measurable improvements in classification accuracy over standard Max -Pooling in -CNN architectures. We provide a general and efficient implementation of the method for any discretized group, which requires only a table defining the group's product structure. We demonstrate the benefits of this method for -CNNs defined on both commutative and non-commutative groups--, , , and (discretized as the cyclic , dihedral , chiral octahedral and full octahedral groups)--acting on and on both -MNIST and -ModelNet10 datasets.
Paper Structure (32 sections, 7 theorems, 39 equations, 4 figures, 2 tables)

This paper contains 32 sections, 7 theorems, 39 equations, 4 figures, 2 tables.

Key Result

Proposition 1

Consider a signal $\Theta: G \mapsto \mathbb{R}^c$. The $G$-Triple-Correlation $\tau$ is $G$-invariant: where $L_g$ denotes an action of a transformation $g$ on the signal $\Theta$.

Figures (4)

  • Figure 1: Achieving Robust $G$-Invariance in $G$-CNNs with the $G$-Triple-Correlation. The output of a $G$-Convolutional layer is equivariant to the actions of $G$ on the domain of the signal. To identify signals that are equivalent up to group action, the layer can be followed by a $G$-Invariant map that eliminates this equivariance. In $G$-CNNs, Max $G$-Pooling is a commonly used for this purpose. Taking the maximum of the $G$-Convolutional equivariant output is indeed invariant to the actions of the group. However, it is also lossy: many non-equivalent output vectors have the same maximum. Our method— the $G$-Triple-Correlation is the lowest-order polynomial invariant map that is completesturmfels2008algorithms. As a complete invariant, it preserves all information about the signal structure, removing only the action of the group. Our approach thus provides a new foundation for achieving robust $G$-Invariance in $G$-CNNs.
  • Figure 2: Datasets. The $O(2)$-MNIST (top) and $O(3)$-ModelNet10 (bottom) datasets are generated by applying a random (rotation, reflection) pair to each element of the original datasets. Although we visualize the continuous group here, in practice, we discretize the group $O(3)$ as the full octahedral group $O_h$ to reduce computational complexity. $SO(2)$ and $SO(3)$ datasets are generated similarly, by applying a random rotation to each datapoint.
  • Figure 3: Models. We compare two simple architectures comprised of a single G-Conv block followed by either a Max $G$-Pool layer or a $G$-TC Layer and an MLP Classifier.
  • Figure 4: Optimized Model Metamers. For each model, 100 targets from the MNIST dataset were randomly selected. 100 inputs were randomly initalized and optimized to yield identical pre-classifier model presentations. All inputs optimized for the $G$-TC Model converge to the orbit of the target. By contrast, metamers that bear no semantic relationship to the targets are found for every target in the Max $G$-Pooling model.

Theorems & Definitions (14)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Example 1
  • Example 2
  • Example 3
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • ...and 4 more