Table of Contents
Fetching ...

Approximately Equivariant Neural Processes

Matthew Ashman, Cristiana Diaconu, Adrian Weller, Wessel Bruinsma, Richard E. Turner

TL;DR

The use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models, is considered, showing that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.

Abstract

Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topographical features like mountains break translation equivariance. In these scenarios, it is desirable to construct architectures that can flexibly depart from exact equivariance in a data-driven way. Current approaches to achieving this cannot usually be applied out-of-the-box to any architecture and symmetry group. In this paper, we develop a general approach to achieving this using existing equivariant architectures. Our approach is agnostic to both the choice of symmetry group and model architecture, making it widely applicable. We consider the use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models. We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, showing that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.

Approximately Equivariant Neural Processes

TL;DR

The use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models, is considered, showing that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.

Abstract

Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topographical features like mountains break translation equivariance. In these scenarios, it is desirable to construct architectures that can flexibly depart from exact equivariance in a data-driven way. Current approaches to achieving this cannot usually be applied out-of-the-box to any architecture and symmetry group. In this paper, we develop a general approach to achieving this using existing equivariant architectures. Our approach is agnostic to both the choice of symmetry group and model architecture, making it widely applicable. We consider the use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models. We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, showing that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.
Paper Structure (68 sections, 7 theorems, 37 equations, 6 figures, 9 tables, 6 algorithms)

This paper contains 68 sections, 7 theorems, 37 equations, 6 figures, 9 tables, 6 algorithms.

Key Result

Proposition 1

The ground-truth stochastic process $P$ is $G$-stationary and $\pi^{\prime}_P$ is $G$-invariant if, and only if, $\pi_{P}$ is $G$-equivariant.

Figures (6)

  • Figure 1: A comparison between the predictive distributions on a single synthetic 1-D regression dataset of the TNP-, ConvCNP-, and EquivCNP-based models. For the approximately equivariant models, we plot both the model's predictive distribution (blue), as well as the predictive distributions obtained without using the fixed inputs (red). The dotted black lines indicate the target range.
  • Figure 2: A comparison between the predictive distributions of the equivariant (left column) and approximately equivariant (middle column) components of the PT-TNP (${\widetilde{T}}$) and EquivCNP (${\widetilde{E}}$) models on a single (cropped) test dataset from the 2-D environmental data experiment.
  • Figure 3: A comparison between the predictive distributions on a single synthetic 1D regression dataset of the TNP-, ConvCNP-, and EquivCNP-based models with different inductive biases (non-equivariant, equivariant, or approximately equivariant). Unlike in \ref{['fig:gp_regression_plot']}, the context range only spans the low-lengthscale region. For the approximately equivariant models, we plot both the model prediction (blue), as well as the predictions obtained without using the fixed inputs, which results in a strictly equivariant model (red). The approximately equivariant models are the only ones able to correctly capture the uncertainties around the lengthscale change point ($x=0$).
  • Figure 4: A comparison between the predictive distributions on a single synthetic 1D regression dataset of the TNP-, ConvCNP-, and EquivCNP-based models with different inductive biases (non-equivariant, equivariant, or approximately equivariant). The context range only spans the high-lengthscale region. For the approximately equivariant models, we plot both the model prediction (blue), as well as the predictions obtained without using the fixed inputs, which results in a strictly equivariant model (red). Both the strictly and approximately equivariant models output predictions that closely resemble the ground truth, but the non-equivariant TNP model completely fails to generalise.
  • Figure 5: Examples of smoke simulations from the smoke plume dataset for six different combinations of smoke radius $r$ and buoyancy $B$. For each such combination, we show the resulting state for all of the three possible x-axis locations. The inputs to our models are randomly sampled $32 \times 32$ patches (indicated in red) from the $128 \times 128$ states.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Definition 1: $G$-equivariance
  • Definition 2: $G$-stationary stochastic process
  • Proposition 1: $G$-stationarity and $G$-equivariance
  • proof
  • Definition 3: $G$-equivariant CNP
  • Theorem 1: Representation of $G$-equivariant CNPs, Theorem 2 by kawano2021group
  • Proposition 2: Finite-rank approximation of compact operators; e.g., Corollary 6.2 by brezis2011functional.
  • Theorem 2: Approximation of non-equivariant linear operators.
  • proof
  • Theorem 3: Approximation of non-equivariant operators.
  • ...and 4 more