Table of Contents
Fetching ...

Symmetry From Scratch: Group Equivariance as a Supervised Learning Task

Haozhe Huang, Leo Kaixuan Cheng, Kaiwen Chen, Alán Aspuru-Guzik

TL;DR

It is shown that general machine learning architectures can learn symmetries directly as a supervised learning task from group equivariant architectures and retain/break the learned symmetry for downstream tasks and this simple formulation enables machine learning models with group-agnostic architectures to capture the inductive bias of group-equivariant architectures.

Abstract

In machine learning datasets with symmetries, the paradigm for backward compatibility with symmetry-breaking has been to relax equivariant architectural constraints, engineering extra weights to differentiate symmetries of interest. However, this process becomes increasingly over-engineered as models are geared towards specific symmetries/asymmetries hardwired of a particular set of equivariant basis functions. In this work, we introduce symmetry-cloning, a method for inducing equivariance in machine learning models. We show that general machine learning architectures (i.e., MLPs) can learn symmetries directly as a supervised learning task from group equivariant architectures and retain/break the learned symmetry for downstream tasks. This simple formulation enables machine learning models with group-agnostic architectures to capture the inductive bias of group-equivariant architectures.

Symmetry From Scratch: Group Equivariance as a Supervised Learning Task

TL;DR

It is shown that general machine learning architectures can learn symmetries directly as a supervised learning task from group equivariant architectures and retain/break the learned symmetry for downstream tasks and this simple formulation enables machine learning models with group-agnostic architectures to capture the inductive bias of group-equivariant architectures.

Abstract

In machine learning datasets with symmetries, the paradigm for backward compatibility with symmetry-breaking has been to relax equivariant architectural constraints, engineering extra weights to differentiate symmetries of interest. However, this process becomes increasingly over-engineered as models are geared towards specific symmetries/asymmetries hardwired of a particular set of equivariant basis functions. In this work, we introduce symmetry-cloning, a method for inducing equivariance in machine learning models. We show that general machine learning architectures (i.e., MLPs) can learn symmetries directly as a supervised learning task from group equivariant architectures and retain/break the learned symmetry for downstream tasks. This simple formulation enables machine learning models with group-agnostic architectures to capture the inductive bias of group-equivariant architectures.
Paper Structure (16 sections, 4 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 4 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Our model-agnostic symmetry-cloning pipeline for learning approximately equivariant model parameters. The supervised learning framework takes feature maps and the parameters for the equivariant architecture as training data and uses the corresponding output as training labels. The pipeline can be used to train a layer of any group-agnostic architecture.
  • Figure 2: a) original MNIST datset (left), b) dataset under $T(2)$ translational tranformations (middle), and c) dataset under $C_4$ discrete rotational transformations (right)
  • Figure 4: Parameter matrix extracted the learned layer of $T(2)\text{-cloning}$ (left); corresponding Toeplitz matrix generated by unrolling convolution (right).
  • Figure 5: Feature maps with $T(2)$-cloned models. The first column shows the original and translated images, the rest are feature maps from a CNN layer, mlp2cnn, 7-block mlp2cnn, and an untrained MLP layer accordingly.
  • Figure 6: Feature maps with $C_4$-cloned models. The first column shows the original and rotated images, the rest are feature maps from a GCNN layer, mlp2gcnn, 7-block mlp2gcnn, and an untrained MLP layer accordingly.
  • ...and 2 more figures