Table of Contents
Fetching ...

Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Jinwoo Kim, Tien Dat Nguyen, Ayhan Suleymanzade, Hyeokjun An, Seunghoon Hong

TL;DR

This work introduces probabilistic_symmetrization, a general framework that learns a group-equivariant transformation of a base model by conditioning a small equivariant distribution p_ω(g|x) on the input. By enforcing probabilistic G-equivariance and combining it with a universal base function f_θ (MLP or transformer), the method guarantees G-equivariance in expectation and universal approximation for invariant/equivariant targets. The authors instantiate p_ω for a range of practical groups—S_n, O(n)/SO(n), E(n)/SE(n), and their products—using lightweight, input-conditioned neural components (e.g., GNNs, Gram–Schmidt, differentiable relaxations) and demonstrate competitive or superior performance across graph isomorphism, n-body dynamics, and real-world graph datasets, often transferring pretrained Vision Transformer weights. This approach decouples symmetry handling from the base architecture, enabling transfer learning and application to diverse domains, while highlighting a trade-off in sampling cost for end-to-end learned equivariance.

Abstract

We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. In contrary to equivariant architectures, we use an arbitrary base model such as an MLP or a transformer and symmetrize it to be equivariant to the given group by employing a small equivariant network that parameterizes the probabilistic distribution underlying the symmetrization. The distribution is end-to-end trained with the base model which can maximize performance while reducing sample complexity of symmetrization. We show that this approach ensures not only equivariance to given group but also universal approximation capability in expectation. We implement our method on various base models, including patch-based transformers that can be initialized from pretrained vision transformers, and test them for a wide range of symmetry groups including permutation and Euclidean groups and their combinations. Empirical tests show competitive results against tailored equivariant architectures, suggesting the potential for learning equivariant functions for diverse groups using a non-equivariant universal base architecture. We further show evidence of enhanced learning in symmetric modalities, like graphs, when pretrained from non-symmetric modalities, like vision. Code is available at https://github.com/jw9730/lps.

Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

TL;DR

This work introduces probabilistic_symmetrization, a general framework that learns a group-equivariant transformation of a base model by conditioning a small equivariant distribution p_ω(g|x) on the input. By enforcing probabilistic G-equivariance and combining it with a universal base function f_θ (MLP or transformer), the method guarantees G-equivariance in expectation and universal approximation for invariant/equivariant targets. The authors instantiate p_ω for a range of practical groups—S_n, O(n)/SO(n), E(n)/SE(n), and their products—using lightweight, input-conditioned neural components (e.g., GNNs, Gram–Schmidt, differentiable relaxations) and demonstrate competitive or superior performance across graph isomorphism, n-body dynamics, and real-world graph datasets, often transferring pretrained Vision Transformer weights. This approach decouples symmetry handling from the base architecture, enabling transfer learning and application to diverse domains, while highlighting a trade-off in sampling cost for end-to-end learned equivariance.

Abstract

We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. In contrary to equivariant architectures, we use an arbitrary base model such as an MLP or a transformer and symmetrize it to be equivariant to the given group by employing a small equivariant network that parameterizes the probabilistic distribution underlying the symmetrization. The distribution is end-to-end trained with the base model which can maximize performance while reducing sample complexity of symmetrization. We show that this approach ensures not only equivariance to given group but also universal approximation capability in expectation. We implement our method on various base models, including patch-based transformers that can be initialized from pretrained vision transformers, and test them for a wide range of symmetry groups including permutation and Euclidean groups and their combinations. Empirical tests show competitive results against tailored equivariant architectures, suggesting the potential for learning equivariant functions for diverse groups using a non-equivariant universal base architecture. We further show evidence of enhanced learning in symmetric modalities, like graphs, when pretrained from non-symmetric modalities, like vision. Code is available at https://github.com/jw9730/lps.
Paper Structure (57 sections, 16 theorems, 53 equations, 7 figures, 7 tables)

This paper contains 57 sections, 16 theorems, 53 equations, 7 figures, 7 tables.

Key Result

Theorem 1

If $p_\omega$ is $G$ equivariant, then $\phi_{\theta,\omega}$ is $G$ equivariant for arbitrary $f_\theta$.

Figures (7)

  • Figure 1: Overview of probabilistic symmetrization. We symmetrize an unconstrained base function $f_\theta$ into an equivariant function $\phi_{\theta,\omega}$ for group $G$ using a learned equivariant distribution $p_\omega(g|\mathbf{x})$.
  • Figure 2: Learned $p_\omega(g|\mathbf{x})$ over time. The entropy of aggregated permutation matrices $\bar{\mathbf{P}} = \sum\mathbf{P}_g/N$ from $\mathbf{P}_g\sim p_\omega(g|\mathbf{x})$ for each input $\mathbf{x}$ drops in early training, indicating that the distribution learns to produce lower-variance permutations as in below visualizations.
  • Figure 3: Visual illustration of the symmetrization methods based on probabilities assigned upon the partitioning of the group $G$ into orbits $G_\mathbf{x}g$. Note that, while we use concentric circles of different perimeters to illustrate each orbit, all orbits actually have an identical cardinality $|G_\mathbf{x}g|=|G_\mathbf{x}|$.
  • Figure 4: Test accuracy of MLP $f_\theta$ symmetrized by equivariant distribution $p_\omega(g|\mathbf{x})$ trained on EXP-classify dataset across a range of training sample sizes. Inference sample size is set to 10.
  • Figure 6: Variance of estimation of MLP $f_\theta$ symmetrized by equivariant distribution $p_\omega(g|\mathbf{x})$ and trained on EXP-classify dataset for a range of training and inference sample sizes.
  • ...and 2 more figures

Theorems & Definitions (34)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • ...and 24 more