Affine Invariance in Continuous-Domain Convolutional Neural Networks
Ali Mohaddes, Johannes Lederer
TL;DR
This work tackles the challenge of achieving affine invariance in continuous-domain convolutional neural networks by embedding inputs into the affine group $G_2 = \\mathbb{R}^2 \\\ltimes \\mathrm{GL}_2(\\mathbb{R})$ and employing a three-layer lifting-convolution-projection GCNN. The authors establish stability under affine transforms from $\\mathrm{GL}_2(\\mathbb{R})$ via layer-wise invariance theorems, and derive practical computation strategies that reduce $G_2$ convolutions to real-space integrals using a QR-based GL$(2)$ decomposition. They validate the approach experimentally on affine-transformed digits, showing that the GCNN outperforms a standard CNN, particularly in data-scarce settings (e.g., mean accuracies such as $0.6950$ vs. $0.3150$ and $0.800$ vs. $0.720$ in reported scenarios). These contributions broaden the class of geometric transformations addressable by GCNNs and offer computationally feasible means to enforce affine invariance in deep learning pipelines.
Abstract
The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Group convolutional neural networks enhance traditional convolutional neural networks by incorporating group-based geometric structures into their design. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considering isometric invariance or similarity invariance, we focus on the full structure of affine transforms generated by the group of all invertible $2 \times 2$ real matrices (generalized linear group $\mathrm{GL}_2(\mathbb{R})$). We introduce a new criterion to assess the invariance of two signals under affine transformations. The input image is embedded into the affine Lie group $G_2 = \mathbb{R}^2 \ltimes \mathrm{GL}_2(\mathbb{R})$ to facilitate group convolution operations that respect affine invariance. Then, we analyze the convolution of embedded signals over $G_2$. In sum, our research could eventually extend the scope of geometrical transformations that usual deep-learning pipelines can handle.
