Table of Contents
Fetching ...

The Selective G-Bispectrum and its Inversion: Applications to G-Invariant Networks

Simon Mataigne, Johan Mathe, Sophia Sanborn, Christopher Hillar, Nina Miolane

TL;DR

It is shown that the G-Bispectrum computation contains redundancies that can be reduced into a selective G-Bispectrum with $\mathcal{O}(|G|)$ complexity, and it is proved desirable mathematical properties of the selective G-Bispectrum.

Abstract

An important problem in signal processing and deep learning is to achieve \textit{invariance} to nuisance factors not relevant for the task. Since many of these factors are describable as the action of a group $G$ (e.g. rotations, translations, scalings), we want methods to be $G$-invariant. The $G$-Bispectrum extracts every characteristic of a given signal up to group action: for example, the shape of an object in an image, but not its orientation. Consequently, the $G$-Bispectrum has been incorporated into deep neural network architectures as a computational primitive for $G$-invariance\textemdash akin to a pooling mechanism, but with greater selectivity and robustness. However, the computational cost of the $G$-Bispectrum ($\mathcal{O}(|G|^2)$, with $|G|$ the size of the group) has limited its widespread adoption. Here, we show that the $G$-Bispectrum computation contains redundancies that can be reduced into a \textit{selective $G$-Bispectrum} with $\mathcal{O}(|G|)$ complexity. We prove desirable mathematical properties of the selective $G$-Bispectrum and demonstrate how its integration in neural networks enhances accuracy and robustness compared to traditional approaches, while enjoying considerable speeds-up compared to the full $G$-Bispectrum.

The Selective G-Bispectrum and its Inversion: Applications to G-Invariant Networks

TL;DR

It is shown that the G-Bispectrum computation contains redundancies that can be reduced into a selective G-Bispectrum with complexity, and it is proved desirable mathematical properties of the selective G-Bispectrum.

Abstract

An important problem in signal processing and deep learning is to achieve \textit{invariance} to nuisance factors not relevant for the task. Since many of these factors are describable as the action of a group (e.g. rotations, translations, scalings), we want methods to be -invariant. The -Bispectrum extracts every characteristic of a given signal up to group action: for example, the shape of an object in an image, but not its orientation. Consequently, the -Bispectrum has been incorporated into deep neural network architectures as a computational primitive for -invariance\textemdash akin to a pooling mechanism, but with greater selectivity and robustness. However, the computational cost of the -Bispectrum (, with the size of the group) has limited its widespread adoption. Here, we show that the -Bispectrum computation contains redundancies that can be reduced into a \textit{selective -Bispectrum} with complexity. We prove desirable mathematical properties of the selective -Bispectrum and demonstrate how its integration in neural networks enhances accuracy and robustness compared to traditional approaches, while enjoying considerable speeds-up compared to the full -Bispectrum.
Paper Structure (21 sections, 8 theorems, 6 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 8 theorems, 6 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Theorem 2.3

kakarala_finite_groups The $G$-Bispectrum of a signal $\Theta:G\mapsto\mathbb{R}$, $\beta(\Theta)$, is given by: where $C_{\rho_1,\rho_2}$ is a unitary matrix called the Clebsch-Gordan matrix, whose definition is recalled in Appendix app:group_theory. For each pair $\rho_1, \rho_2$, the matrix $\beta(\Theta)_{\rho_1, \rho_2}$ is called a $G$-bispectral coefficient.

Figures (7)

  • Figure 1: Illustration of the different proposed $G$-CNN modules cohenc16sanborn2023general. The input $f$ is first processed through the $G$-convolutional layer composed of $K$ filters $\{\phi_k\}_{k=1}^K$. Then, an invariant layer is chosen (Max $G$-pooling, $G$-TC, or the selective/full $G$-Bispectrum layer). Finally, the "pooled" output is fed to a neural network designed for the machine learning task at hand.
  • Figure 2: Computation of the selective $G$-Bispectrum for the Full Octahedral Group. The gradient of color represents the order in which the $G$-bispectral coefficients are computed. The Kronecker Table represents which irreps emerge from the decomposition into irreps of the tensor product $\rho_i \otimes \rho_j$. We observe that the selective $G$-Bispectrum has only $6$ coefficients, compared to $100$ coefficients for the full $G$-Bispectrum.
  • Figure 3: Comparison of full and selective $G$-Bispectra for the dihedral group $D_4$ (left) and the octahedral group $O_h$ (right). The Kronecker tables of both groups show which irreps emerge from the decomposition into irreps of the tensor product of irreps $\rho_i \otimes \rho_j$. The colored boxes highlight the bispectral coefficients chosen for the full and selective Bispectra. Our proposed selective Bispectrum captures the same information as the full Bispectrum but with significantly fewer coefficients.
  • Figure 4: Evolution of the average training times for the different invariant layers. The parameter $n$ is the size of the groups $C_n$ and $D_n$. The average and standard deviations are obtained over $10$ runs. For all runs, the number of parameters of the complete neural network (filters and MLP) is set to $50000$ and $150000$ for $\mathrm{SO}(2)$ and $\mathrm{O}(2)$ respectively. Standard deviations are reported by vertical intervals. When a FFT is available, our selective G-Bispectrum significantly outperforms other complete G-invariant pooling layers in terms of speed. Specifically, when working with $C_{2^7}$, training on a dataset of $60000$ images takes only $247$ seconds, whereas the $G$-TC requires 1465 seconds.
  • Figure 5: At the top: Evolution of the average classification accuracy with rotated MNIST ($\mathrm{SO}(2)$-MNIST) and rotated-reflected MNIST ($\mathrm{O}(2)$-MNIST) over $10$ runs when the number of filters varies from $2$ to $20$ for the Avg $G$-pooling, the Max $G$-pooling, the selective $G$-Bispectrum and the $G$-TC. The number of parameters of each model is maintained equal for fair comparison. The standard deviations are represented using vertical intervals. With the selective $G$-Bispectrum layer, we can reduce the number of convolutional filters needed for a given accuracy. For example, with only $K = 2$ filters, we achieve 96% accuracy, compared to 63% with the Max $G$-pooling layer. Our approach allows $G$-CNNs to maintain competitive accuracy while using smaller neural networks. At the bottom, the same results are displayed with time instead of the number of filters on the $x$-axis. The dotted lines reproduce the evolution of $K$ from the figures at the top. We can observe that the selective $G$-Bispectrum is faster than the $G$-TC when a FFT is available, thus here in the case of $\mathrm{SO}(2)$-MNIST. Recall that an FFT can be implemented for many groups Diaconis1990EfficientCO
  • ...and 2 more figures

Theorems & Definitions (11)

  • Definition 2.1
  • Definition 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Proposition 3.1
  • proof
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • ...and 1 more