Table of Contents
Fetching ...

Coordinate Independent Convolutional Networks -- Isometry and Gauge Equivariant Convolutions on Riemannian Manifolds

Maurice Weiler, Patrick Forré, Erik Verlinde, Max Welling

TL;DR

We present a unified, coordinate-free framework for convolutional neural networks on Riemannian manifolds that are coordinate independent through gauge (G-structure) equivariance. Central to the theory are G-steerable kernels and kernel field transforms that ensure weight sharing can be performed without relying on a canonical local frame. The paper develops the necessary differential-geometric machinery (fiber bundles, associated bundles, connections, and parallel transport) and demonstrates isometry equivariance for GM-convolutions, including a detailed Möbius-strip toy model. By reviewing existing Euclidean CNNs, spherical CNNs, and surface CNNs as special cases, it shows how many prior architectures fit into the coordinate-independent framework, and provides a rigorous path to constructing new, symmetry-aware networks tailored to the manifold structure. Overall, the work offers a principled blueprint for designing coordinate- and symmetry-aware neural networks on general geometric domains, with explicit guidance on kernel design, weight sharing, and isometry handling, and supports practical implementations on nontrivial manifolds like the Möbius strip.

Abstract

Motivated by the vast success of deep convolutional networks, there is a great interest in generalizing convolutions to non-Euclidean manifolds. A major complication in comparison to flat spaces is that it is unclear in which alignment a convolution kernel should be applied on a manifold. The underlying reason for this ambiguity is that general manifolds do not come with a canonical choice of reference frames (gauge). Kernels and features therefore have to be expressed relative to arbitrary coordinates. We argue that the particular choice of coordinatization should not affect a network's inference -- it should be coordinate independent. A simultaneous demand for coordinate independence and weight sharing is shown to result in a requirement on the network to be equivariant under local gauge transformations (changes of local reference frames). The ambiguity of reference frames depends thereby on the G-structure of the manifold, such that the necessary level of gauge equivariance is prescribed by the corresponding structure group G. Coordinate independent convolutions are proven to be equivariant w.r.t. those isometries that are symmetries of the G-structure. The resulting theory is formulated in a coordinate free fashion in terms of fiber bundles. To exemplify the design of coordinate independent convolutions, we implement a convolutional network on the Möbius strip. The generality of our differential geometric formulation of convolutional networks is demonstrated by an extensive literature review which explains a large number of Euclidean CNNs, spherical CNNs and CNNs on general surfaces as specific instances of coordinate independent convolutions.

Coordinate Independent Convolutional Networks -- Isometry and Gauge Equivariant Convolutions on Riemannian Manifolds

TL;DR

We present a unified, coordinate-free framework for convolutional neural networks on Riemannian manifolds that are coordinate independent through gauge (G-structure) equivariance. Central to the theory are G-steerable kernels and kernel field transforms that ensure weight sharing can be performed without relying on a canonical local frame. The paper develops the necessary differential-geometric machinery (fiber bundles, associated bundles, connections, and parallel transport) and demonstrates isometry equivariance for GM-convolutions, including a detailed Möbius-strip toy model. By reviewing existing Euclidean CNNs, spherical CNNs, and surface CNNs as special cases, it shows how many prior architectures fit into the coordinate-independent framework, and provides a rigorous path to constructing new, symmetry-aware networks tailored to the manifold structure. Overall, the work offers a principled blueprint for designing coordinate- and symmetry-aware neural networks on general geometric domains, with explicit guidance on kernel design, weight sharing, and isometry handling, and supports practical implementations on nontrivial manifolds like the Möbius strip.

Abstract

Motivated by the vast success of deep convolutional networks, there is a great interest in generalizing convolutions to non-Euclidean manifolds. A major complication in comparison to flat spaces is that it is unclear in which alignment a convolution kernel should be applied on a manifold. The underlying reason for this ambiguity is that general manifolds do not come with a canonical choice of reference frames (gauge). Kernels and features therefore have to be expressed relative to arbitrary coordinates. We argue that the particular choice of coordinatization should not affect a network's inference -- it should be coordinate independent. A simultaneous demand for coordinate independence and weight sharing is shown to result in a requirement on the network to be equivariant under local gauge transformations (changes of local reference frames). The ambiguity of reference frames depends thereby on the G-structure of the manifold, such that the necessary level of gauge equivariance is prescribed by the corresponding structure group G. Coordinate independent convolutions are proven to be equivariant w.r.t. those isometries that are symmetries of the G-structure. The resulting theory is formulated in a coordinate free fashion in terms of fiber bundles. To exemplify the design of coordinate independent convolutions, we implement a convolutional network on the Möbius strip. The generality of our differential geometric formulation of convolutional networks is demonstrated by an extensive literature review which explains a large number of Euclidean CNNs, spherical CNNs and CNNs on general surfaces as specific instances of coordinate independent convolutions.

Paper Structure

This paper contains 255 sections, 23 theorems, 617 equations, 64 figures, 5 tables.

Key Result

Theorem 7.7

Let $\mathcal{K}$ be a kernel field whose individual kernels $\mathcal{K}_p$ at any $p\in M$ are (at most) supported on a closed ball of radius $R>0$ around the origin of $T_{\mkern-1.5mu p}\mkern-.5muM$, that is, The corresponding kernel field transform $\mathscr{T}_{\overset{}{\mkern-0mu \mathcal{K}}}$ is then guaranteed to be well defined, i.e. the integral in Eq. eq:kernel_field_trafo_def_ptw

Figures (64)

  • Figure 1: Different observers $A$ and $B$ may perceive a pattern of features from a different "viewpoint". The satellites in our application are convolution kernels which summarize their local field of view around $p$ into a feature vector at $p$. Their "viewpoint" is a choice of local reference frame (gauge) at $p$, along which the kernel is aligned. Since the observations from both viewpoints represent the same pattern, the kernel responses should contain equivalent information, that is, the inference should be coordinate independent. This constrains the convolution kernels to be equivariant under local gauge transformations, i.e. changes of reference frames. The level of gauge equivariance is determined by the structure group$G$, which depends both on the manifold and the application. (Lizards adapted under the Creative Commons Attribution 4.0 International https://github.com/twitter/twemoji/blob/gh-pages/LICENSE-GRAPHICS by courtesy of Twitter.)
  • Figure 2: An intuition on the inherent ambiguity of weight sharing on manifolds. Left: A common interpretation of weight sharing on the plane is to shift a kernel over the whole space. Since parallel transport is path independent on flat spaces, this is unambiguous. Middle: On curved spaces, like the sphere, parallel transport is path dependent. Different paths result in kernels that are rotated relative to each other. Right: The Möbius strip is a non-orientable manifold. Different paths can therefore result in kernels that are reflected relative to each other. Bottom: We formalize different kernel alignments by different choices of local reference frames of the corresponding tangent spaces. It is well known that no choice of reference frames (gauge) is preferred on general manifolds. Different coordinatizations are related by gauge transformations, which take values in the structure group $G$ of the manifold (the trivial group $G=\{e\}$ for the plane, rotation group $G=\operatorname{SO}(2)$ for the sphere and reflection group $G = \mathscr{R}$ for the Möbius strip). Coordinate independent CNNs address the ambiguity of reference frames by applying $G$-steerable (gauge equivariant) convolution kernels.
  • Figure 3: Sharing an $\mathscr{R}$-steerable kernel according to a given $\mathscr{R}$-structure $\mathscr{R}\mkern-1.8muM$ over a manifold $M = \mathbb{R}^2$. There are two continuous gauges (red and green) along which the kernel could be shared. Due to its $\mathscr{R}$-equivariance, the particular choice is ultimately irrelevant. The visualized kernel is antisymmetric and maps therefore between scalar and pseudoscalar fields. It is easily verified that this is indeed the case: the numerical coefficients of a scalar input field stay invariant under gauge transformations but the kernels are reflected. As they are antisymmetric, their responses will negate -- which is the transformation law of the numerical coefficients of a pseudoscalar field. A similar reasoning holds for mappings from pseudoscalars to scalars. How could we have mapped from scalars to scalars? In this case both the input and the output should be gauge invariant, requiring the kernel to be symmetric instead of antisymmetric. Symmetric kernels map furthermore between pseudoscalar fields.
  • Figure 4: Gauge transformations, isometries and their mutual relation. Left: A gauge is a choice of local reference frames (a frame field), relative to which geometric quantities may be expressed. If the manifold's structure group $G$ is non-trivial, the choice of gauge is not unique but many equivalent choices exist. Different choices of gauges (red or green) are related by gauge transformations (blue) which are by definition taking values in the structure group $G$. Visualized are orthonormal, right-handed frames, for which the gauge transformations are $G=\operatorname{SO}(2)$-valued. Middle: Isometries are the symmetries of Riemannian manifolds. They are defined as distance preserving functions $\phi: M \to M$, mapping the manifold to itself. Isometries act via pushforward on tangent vectors, reference frames and feature vectors. While gauge transformations are passive coordinate transformations, isometries are actively moving points and geometric quantities over the manifold. Right: When being expressed relative to local reference frames, the action of isometries can be thought of as inducing gauge transformations. Assume frames at $p$ (red) and $\phi(p)$ (green) to be given. A geometric quantity at $p$ (orange) is by the isometry pushed to $\phi(p)$ (purple). Since $\phi$ is an isometry, the Riemannian geometry around $p$ and $\phi(p)$ is indistinguishable, however, the pushforward of the geometric quantity is expressed relative to a new reference frame (green instead of red). One can therefore view isometries as inducing gauge transformations. If these induced gauge transformations take values in the structure group $G$, they are explained away by the $G$-steerability (gauge equivariance) of convolution kernels -- $G\mkern-1.4muM$-convolutions are then isometry equivariant. This condition is always met for $G\geq\operatorname{O}(d)$.
  • Figure 5: Identification of $T_{\mkern-1.5mu p}\mkern-.5muM\!\cong\!\mathbb{R}^2$ with $\mathbb{R}^2$ via different gauges. A (coordinate free) tangent vector $v\in T_{\mkern-1.5mu p}\mkern-.5muM$ (orange) can be represented numerically by a coordinate tuple $v^A=\psi_p^A(v)=(1,1)^\top$ relative to gauge $\psi_p^A$ (red) or, equivalently, by $v^B=\psi_p^B(v)=(\sqrt{2},0)^\top$ relative to gauge $\psi_p^B$ (green). A choice of gauge corresponds to a choice $[e_1^A,e_2^A]$ or $[e_1^B,e_2^B]$ of reference frame. On a general manifold no choice of gauge or coordinatization is preferred a priori. Different gauges, and thus reference frames, are related by gauge transformations $g_p^{BA}:=\psi_p^B\circ(\psi_p^A)^{-1}$ (blue) which take values in the thus defined structure group $G$. This figure is a graphical interpretation of the commutative diagrams in Eq. \ref{['eq:commutative_diagram_TpM']} and Fig. \ref{['fig:trivialization_TM']}. Note that gauges are immediately assigning coordinates to tangent spaces. Fig. \ref{['fig:affine_charts']} in Section \ref{['sec:euclidean_geometry']} shows a similar diagram for (affine) charts, which assign coordinates to the manifold, thereby inducing gauges ("coordinate bases").
  • ...and 59 more figures

Theorems & Definitions (62)

  • Definition 7.1: ${1\times1}$ $G\mkern-1.4muM$-convolution
  • Definition 7.2: Kernel field
  • Definition 7.3: $G$-steerable kernel
  • Definition 7.4: $G\mkern-1.4muM$-convolutional kernel field
  • Definition 7.5: Transporter pullback of feature field to TM
  • Definition 7.6: Kernel field transform
  • Theorem 7.7: Kernel field transform existence for compactly supported kernels
  • Proof 1
  • Definition 7.8: $G\mkern-1.4muM$-convolution
  • Theorem 7.9: Kernel field transform in coordinates
  • ...and 52 more