Table of Contents
Fetching ...

On the hardness of learning under symmetries

Bobak T. Kiani, Thien Le, Hannah Lawrence, Stefanie Jegelka, Melanie Weber

TL;DR

The paper tackles the computational hardness of learning equivariant neural networks under gradient-based optimization. By extending the correlational statistical query (CSQ) framework to invariant architectures (notably GNNs and frame-averaged CNNs) and analyzing Gaussian input distributions, it derives exponential and superpolynomial lower bounds that persist despite symmetry. It also proves NP-hardness for proper learning of GNNs and provides experiments that corroborate the hardness results. The findings suggest that symmetry alone is insufficient for efficient learnability in worst-case settings, underscoring the need for additional inductive biases or problem structure to achieve practical guarantees.

Abstract

We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.

On the hardness of learning under symmetries

TL;DR

The paper tackles the computational hardness of learning equivariant neural networks under gradient-based optimization. By extending the correlational statistical query (CSQ) framework to invariant architectures (notably GNNs and frame-averaged CNNs) and analyzing Gaussian input distributions, it derives exponential and superpolynomial lower bounds that persist despite symmetry. It also proves NP-hardness for proper learning of GNNs and provides experiments that corroborate the hardness results. The findings suggest that symmetry alone is insufficient for efficient learnability in worst-case settings, underscoring the need for additional inductive biases or problem structure to achieve practical guarantees.

Abstract

We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.
Paper Structure (46 sections, 40 theorems, 127 equations, 4 figures, 1 table)

This paper contains 46 sections, 40 theorems, 127 equations, 4 figures, 1 table.

Key Result

Theorem 2

For a given symmetry group $G$ with representation $\rho:G \to GL(\{-1,+1\}^n)$, let $\|p_{\mathcal{O}_\rho}\| \coloneqq ({\sum_{O_k \in \mathcal{O}_\rho} \left(\frac{|O_k|}{2^n}\right)^2})^{1/2}$ and let $\mathcal{H}_\rho$ be the class of symmetric Boolean functions, defined as Any SQ learner capable of learning $\mathcal{H}_\rho$ up to sufficiently small classification error probability $\epsil

Figures (4)

  • Figure 1: Overparameterized GNN (a) and CNN (b) fail to learn functions from the class $\mathcal{H}_{ER,n}$ and $C^{\mathcal{B}}_{\mathcal{F}}$ respectively by either failing to fit the training set or overfitting the data. Plots are aggregated and averaged over five random realizations.
  • Figure 2: Sample form of function $h(x)$ used in constructing $g_{S,b}({\bm{A}})$ as a GNN. For the construction, we will have $x = \sum_{i \in S} [{\bm{c}}_{{\bm{A}}}]_i$.
  • Figure 3: Replication of experiments as in \ref{['fig:experiment_performance']}, except here, we consider a minimal architecture consisting of a single layer of graph or cyclic convolution followed by a single hidden layer MLP. This is the minimal number of layers needed to learn the desired function classes for the architectures considered. For the CNN plot, the jumps in the train set MSE are due to perturbations in the loss at very low values near computer precision.
  • Figure 4: Replication of experiments in \ref{['fig:GNN_performance']} with different optimizers show that the performance of the GNN is virtually the same across the various optimizers. Performance is averaged over 10 runs. For each run, the learning rate is chosen by perturbing the default learning rate by a random multiplicative factor in the range $[0.1, 10]$.

Theorems & Definitions (86)

  • Example 1: GD from $\operatorname{CSQ}$
  • Definition 1: SQ (CSQ) Learning
  • Theorem 2: Boolean SQ hardness
  • proof : Proof sketch
  • Theorem 3: SQ hardness of $\mathcal{H}_{ER,n}$
  • proof : Proof sketch
  • Theorem 4: Exponential CSQ lower bound for GNNs
  • proof : Proof sketch
  • Proposition 5: $\mathsf{NP}$ hardness of GNN training; informal
  • Example 2: Frame for CNN
  • ...and 76 more