Table of Contents
Fetching ...

Learning with Exact Invariances in Polynomial Time

Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet

TL;DR

The paper tackles learning under exact symmetries in kernel regression, a setting where naive approaches like data augmentation or group averaging are computationally infeasible for large groups. By leveraging the Laplace–Beltrami spectrum and the commutativity of group actions with the Laplacian, it reformulates invariance constraints into linear conditions across spectral eigenspaces, enabling a polynomial-time procedure. The proposed Spectral Averaging (Spec-Avg) method reduces the constraint set to a generator-based subset, truncates the spectral expansion, and solves closed-form projections in each eigenspace to enforce exact invariance, achieving the same minimax-risk rate as non-invariant kernel regression. This yields a statistically optimal and computationally efficient approach for learning with exact invariances on manifolds, with potential extensions to broader oracle models and kernel-trick adaptations.

Abstract

We study the statistical-computational trade-offs for learning with exact invariances (or symmetries) using kernel regression. Traditional methods, such as data augmentation, group averaging, canonicalization, and frame-averaging, either fail to provide a polynomial-time solution or are not applicable in the kernel setting. However, with oracle access to the geometric properties of the input space, we propose a polynomial-time algorithm that learns a classifier with \emph{exact} invariances. Moreover, our approach achieves the same excess population risk (or generalization error) as the original kernel regression problem. To the best of our knowledge, this is the first polynomial-time algorithm to achieve exact (not approximate) invariances in this context. Our proof leverages tools from differential geometry, spectral theory, and optimization. A key result in our development is a new reformulation of the problem of learning under invariances as optimizing an infinite number of linearly constrained convex quadratic programs, which may be of independent interest.

Learning with Exact Invariances in Polynomial Time

TL;DR

The paper tackles learning under exact symmetries in kernel regression, a setting where naive approaches like data augmentation or group averaging are computationally infeasible for large groups. By leveraging the Laplace–Beltrami spectrum and the commutativity of group actions with the Laplacian, it reformulates invariance constraints into linear conditions across spectral eigenspaces, enabling a polynomial-time procedure. The proposed Spectral Averaging (Spec-Avg) method reduces the constraint set to a generator-based subset, truncates the spectral expansion, and solves closed-form projections in each eigenspace to enforce exact invariance, achieving the same minimax-risk rate as non-invariant kernel regression. This yields a statistically optimal and computationally efficient approach for learning with exact invariances on manifolds, with potential extensions to broader oracle models and kernel-trick adaptations.

Abstract

We study the statistical-computational trade-offs for learning with exact invariances (or symmetries) using kernel regression. Traditional methods, such as data augmentation, group averaging, canonicalization, and frame-averaging, either fail to provide a polynomial-time solution or are not applicable in the kernel setting. However, with oracle access to the geometric properties of the input space, we propose a polynomial-time algorithm that learns a classifier with \emph{exact} invariances. Moreover, our approach achieves the same excess population risk (or generalization error) as the original kernel regression problem. To the best of our knowledge, this is the first polynomial-time algorithm to achieve exact (not approximate) invariances in this context. Our proof leverages tools from differential geometry, spectral theory, and optimization. A key result in our development is a new reformulation of the problem of learning under invariances as optimizing an infinite number of linearly constrained convex quadratic programs, which may be of independent interest.

Paper Structure

This paper contains 26 sections, 7 theorems, 71 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Consider the problem of learning with invariances with respect to a finite group $G$ using a labeled dataset of size $n$ sampled from a manifold of dimension $d$. Assume that the optimal regression function belongs to the Sobolev space of functions of order $s$, i.e., $f^\star \in H^s(\mathcal{M})$

Figures (2)

  • Figure 1: Invariance Discrepancy measure of Kernel Ridge Regression (KRR) for various choices of the regularization parameter $\lambda$. The resulting estimator, KRR, is not invariant with respect to the group $G$ of sign averages $\{\pm 1\}^d$, whereas Spec-Avg is $G$-invariant by construction. Each point in the plot represents an average over 10 different random seeds. The Invariance Discrepancy measure used for this plot is defined as $\sup_{x \in \mathcal{X}, g \in G} |\widehat{f}(x) - \widehat{f}(g x)|,$ where $\widehat{f}$ is the estimator. The set $\mathcal{X}$ consists of 100 points uniformly sampled from the interval $[-1, 1]^d$, independently and identically distributed.
  • Figure 2: Test error (empirical excess population risk) of KRR for different choices of the regularization parameter $\lambda$ and Spec-Avg for different choices of the sparsity parameter $D$. Conceptually, higher values of $\lambda$ and lower values of $D$ encourage sparser representations for the estimators KRR and Spec-Avg, respectively. As suggested by our theory, it can be observed that test error rates of the same order can be achieved by Spec-Avg and KRR with appropriate choices of hyperparameters. Note that the test errors are shown on a log scale. Their almost linear behavior implies that they are polynomial functions of the number of training samples with comparable orders. We note that each point in the plot represents an average over 10 different random seeds.

Theorems & Definitions (32)

  • Theorem 1: Learning with exact invariances in polynomial time
  • Remark 4.1
  • Definition 5.1
  • Example 5.2
  • Example 5.3
  • Definition 2.1: Manifold
  • Definition 2.2: Local Coordinates
  • Definition 2.3: Tangent Space
  • Definition 2.4: Riemannian Metric Tensor
  • Definition 2.5: Riemannian Manifold
  • ...and 22 more