Table of Contents
Fetching ...

O$n$ Learning Deep O($n$)-Equivariant Hyperspheres

Pavlo Melnyk, Michael Felsberg, Mårten Wadenbäck, Andreas Robinson, Cuong Le

TL;DR

This work addresses learning deep features that are equivariant to orthogonal transformations in arbitrary dimensions by introducing Deep Equivariant Hyperspheres (DEH), which combine regular $n$-simplexes with $n$-dimensional spherical decision surfaces. The authors derive a simplex-based simplex change-of-basis $M_n$, construct $n$D equivariant spheres, and cascade them to build deep, point-based representations; they also propose an invariant Gram-based operator $oldsymbol{ riangle}= extbf{Y} extbf{Y}^ op$ to capture higher-order relations. Theoretical results establish $O(n)$-equivariance of the neuron and practical techniques for normalization, bias, and higher-order interactions, complemented by empirical validation on $ ext{O}(3)$ and $ ext{O}(5)$ tasks where DEH outperforms several baselines while offering favorable speed/performance trade-offs. The approach generalizes to any dimension, enabling scalable, geometry-aware learning for 3D/4D data with potential applications in molecular design and related domains; code is released at the provided repository.

Abstract

In this paper, we utilize hyperspheres and regular $n$-simplexes and propose an approach to learning deep features equivariant under the transformations of $n$D reflections and rotations, encompassed by the powerful group of O$(n)$. Namely, we propose O$(n)$-equivariant neurons with spherical decision surfaces that generalize to any dimension $n$, which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in $n$D, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O$(n)$-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off. The code is available at https://github.com/pavlo-melnyk/equivariant-hyperspheres.

O$n$ Learning Deep O($n$)-Equivariant Hyperspheres

TL;DR

This work addresses learning deep features that are equivariant to orthogonal transformations in arbitrary dimensions by introducing Deep Equivariant Hyperspheres (DEH), which combine regular -simplexes with -dimensional spherical decision surfaces. The authors derive a simplex-based simplex change-of-basis , construct D equivariant spheres, and cascade them to build deep, point-based representations; they also propose an invariant Gram-based operator to capture higher-order relations. Theoretical results establish -equivariance of the neuron and practical techniques for normalization, bias, and higher-order interactions, complemented by empirical validation on and tasks where DEH outperforms several baselines while offering favorable speed/performance trade-offs. The approach generalizes to any dimension, enabling scalable, geometry-aware learning for 3D/4D data with potential applications in molecular design and related domains; code is released at the provided repository.

Abstract

In this paper, we utilize hyperspheres and regular -simplexes and propose an approach to learning deep features equivariant under the transformations of D reflections and rotations, encompassed by the powerful group of O. Namely, we propose O-equivariant neurons with spherical decision surfaces that generalize to any dimension , which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in D, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off. The code is available at https://github.com/pavlo-melnyk/equivariant-hyperspheres.
Paper Structure (35 sections, 9 theorems, 28 equations, 4 figures, 4 tables)

This paper contains 35 sections, 9 theorems, 28 equations, 4 figures, 4 tables.

Key Result

Proposition 1

Let $\textup{M}_n$ be the-change-of-basis matrix defined in eq:nd_basis_matrix. Then $\textup{M}_n$ is an $(n+1)$D rotation or reflection, , $\textup{M}_n \in \mathop{\mathrm{\textup{O}}}\nolimits(n+1)$ (see Section sec:A_numeric_instances in the Appendix for numeric examples).

Figures (4)

  • Figure 1: The central components of Deep Equivariant Hyperspheres (best viewed in color): regular $n$-simplexes with the $n$D spherical decision surfaces located at their vertices and the simplex change-of-basis matrices $\textbf{M}_n$ (displayed for $n=2$ and $n=3$).
  • Figure 2: Left: real data experiment (the higher the accuracy the better); all the presented models are also permutation-invariant. Center and right: synthetic data experiments (the lower the mean squared error (MSE) the better); dotted lines mean that the results of the methods are copied from finzi2021practical ($\mathop{\mathrm{\textup{O}}}\nolimits(5)$ regression) or ruhe2023clifford ($\mathop{\mathrm{\textup{O}}}\nolimits(5)$ convex hulls). Best viewed in color.
  • Figure 3: Speed/performance trade-off (the models are trained on all the available training data). Note that the desired trade-off is toward the top-left corner (higher accuracy and faster inference) in the left figure, and toward the bottom-left corner (lower error and faster inference) in the center and right figures. To measure inference time, we used an NVIDIA A100. Best viewed in color.
  • Figure 4: Architecture of our DEH model. All the operations are point-wise, , shared amongst $N$ points. Each subsequent layer of equivariant hyperspheres contains $K_l$ neurons for each of the $\prod_i^{d} K_i$ preceding layer channels. The architectures of the non-permutation-invariant variants differ only in that the global aggregation function over $N$ is substituted with the flattening of the feature map.

Theorems & Definitions (17)

  • Proposition 1
  • proof
  • Lemma 2
  • proof
  • Proposition 3
  • proof
  • Theorem 4
  • proof
  • Proposition 5
  • proof
  • ...and 7 more