Table of Contents
Fetching ...

Learning Symmetries via Weight-Sharing with Doubly Stochastic Tensors

Putri A. van der Linden, Alejandro García-Castellanos, Sharvaree Vadgama, Thijs P. Kuipers, Erik J. Bekkers

TL;DR

The paper tackles the limitation of fixed, pre-specified symmetry biases in deep learning by introducing Weight Sharing Convolutional Networks (WSCNNs) that learn symmetry-driven kernel sharing via stacks of learnable doubly stochastic matrices, i.e., soft permutations. By enforcing double stochasticity through the Sinkhorn operator, the approach jointly optimizes weight-sharing patterns and base kernels, enabling exact regular group convolutions in strongly symmetric data and flexible partial symmetry discovery otherwise. Empirically, WSCNNs match or surpass fixed-symmetry models on rotated MNIST and CIFAR-10 while using fewer parameters, and they reveal learned representations that align with underlying group-like transformations. The work provides a practical pathway to data-driven symmetry discovery with implications for parameter efficiency and robustness in symmetry-aware learning systems.

Abstract

Group equivariance has emerged as a valuable inductive bias in deep learning, enhancing generalization, data efficiency, and robustness. Classically, group equivariant methods require the groups of interest to be known beforehand, which may not be realistic for real-world data. Additionally, baking in fixed group equivariance may impose overly restrictive constraints on model architecture. This highlights the need for methods that can dynamically discover and apply symmetries as soft constraints. For neural network architectures, equivariance is commonly achieved through group transformations of a canonical weight tensor, resulting in weight sharing over a given group $G$. In this work, we propose to learn such a weight-sharing scheme by defining a collection of learnable doubly stochastic matrices that act as soft permutation matrices on canonical weight tensors, which can take regular group representations as a special case. This yields learnable kernel transformations that are jointly optimized with downstream tasks. We show that when the dataset exhibits strong symmetries, the permutation matrices will converge to regular group representations and our weight-sharing networks effectively become regular group convolutions. Additionally, the flexibility of the method enables it to effectively pick up on partial symmetries.

Learning Symmetries via Weight-Sharing with Doubly Stochastic Tensors

TL;DR

The paper tackles the limitation of fixed, pre-specified symmetry biases in deep learning by introducing Weight Sharing Convolutional Networks (WSCNNs) that learn symmetry-driven kernel sharing via stacks of learnable doubly stochastic matrices, i.e., soft permutations. By enforcing double stochasticity through the Sinkhorn operator, the approach jointly optimizes weight-sharing patterns and base kernels, enabling exact regular group convolutions in strongly symmetric data and flexible partial symmetry discovery otherwise. Empirically, WSCNNs match or surpass fixed-symmetry models on rotated MNIST and CIFAR-10 while using fewer parameters, and they reveal learned representations that align with underlying group-like transformations. The work provides a practical pathway to data-driven symmetry discovery with implications for parameter efficiency and robustness in symmetry-aware learning systems.

Abstract

Group equivariance has emerged as a valuable inductive bias in deep learning, enhancing generalization, data efficiency, and robustness. Classically, group equivariant methods require the groups of interest to be known beforehand, which may not be realistic for real-world data. Additionally, baking in fixed group equivariance may impose overly restrictive constraints on model architecture. This highlights the need for methods that can dynamically discover and apply symmetries as soft constraints. For neural network architectures, equivariance is commonly achieved through group transformations of a canonical weight tensor, resulting in weight sharing over a given group . In this work, we propose to learn such a weight-sharing scheme by defining a collection of learnable doubly stochastic matrices that act as soft permutation matrices on canonical weight tensors, which can take regular group representations as a special case. This yields learnable kernel transformations that are jointly optimized with downstream tasks. We show that when the dataset exhibits strong symmetries, the permutation matrices will converge to regular group representations and our weight-sharing networks effectively become regular group convolutions. Additionally, the flexibility of the method enables it to effectively pick up on partial symmetries.

Paper Structure

This paper contains 46 sections, 15 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Kernel stacks are acquired through a learned weight-sharing scheme applied to a set of flattened base kernels.
  • Figure 2: Learned kernels from the lifting layer of WSCNN, applied to rotated MNIST and reshaped to $[\mathrm{No. elem.}, C_{out}]$. Since $\mathbf{R}_1$ is set as the identity operator, the first column displays the raw kernels.
  • Figure 3: Comparison of $C_4$ representations and the representation stack learned by the lifting layer on the rotated MNIST dataset. Top: learned representations. Bottom: permutations for $C_4$ on $d=25$.
  • Figure 4: Samples of the two tasks.
  • Figure 5: Coefficient responses of learned representations and their base transformations.
  • ...and 8 more figures