Table of Contents
Fetching ...

Symmetry-Aware Graph Metanetwork Autoencoders: Model Merging through Parameter Canonicalization

Odysseas Boufalis, Jorge Carrasco-Pollo, Joshua Rosenthal, Eduardo Terres-Caballero, Alejandro García-Castellanos

TL;DR

The paper tackles model merging in neural networks by addressing parameter-space symmetries, notably permutations and scalings, which create many equivalent minima. It introduces a symmetry-aware autoencoder using ScaleGMNs (and compares to Neural Graphs) to canonically map networks into a symmetry-invariant latent space, bypassing explicit combinatorial alignment. Functionally equivalent networks are reconstructed to a canonical weight configuration, enabling near-linear interpolation and effective merging for implicit neural representations and CNNs. The approach also extends the Git Re-Basin baseline to handle sign-flips and analyzes the role of scaling symmetries through latent-space visualizations and interpolation experiments. Overall, the method yields improved linear mode connectivity with linear-time inference and suggests a scalable path toward symmetry-aware canonicalization for broader architectures.

Abstract

Neural network parameterizations exhibit inherent symmetries that yield multiple equivalent minima within the loss landscape. Scale Graph Metanetworks (ScaleGMNs) explicitly leverage these symmetries by proposing an architecture equivariant to both permutation and parameter scaling transformations. Previous work by Ainsworth et al. (2023) addressed permutation symmetries through a computationally intensive combinatorial assignment problem, demonstrating that leveraging permutation symmetries alone can map networks into a shared loss basin. In this work, we extend their approach by also incorporating scaling symmetries, presenting an autoencoder framework utilizing ScaleGMNs as invariant encoders. Experimental results demonstrate that our method aligns Implicit Neural Representations (INRs) and Convolutional Neural Networks (CNNs) under both permutation and scaling symmetries without explicitly solving the assignment problem. This approach ensures that similar networks naturally converge within the same basin, facilitating model merging, i.e., smooth linear interpolation while avoiding regions of high loss. The code is publicly available on our GitHub repository.

Symmetry-Aware Graph Metanetwork Autoencoders: Model Merging through Parameter Canonicalization

TL;DR

The paper tackles model merging in neural networks by addressing parameter-space symmetries, notably permutations and scalings, which create many equivalent minima. It introduces a symmetry-aware autoencoder using ScaleGMNs (and compares to Neural Graphs) to canonically map networks into a symmetry-invariant latent space, bypassing explicit combinatorial alignment. Functionally equivalent networks are reconstructed to a canonical weight configuration, enabling near-linear interpolation and effective merging for implicit neural representations and CNNs. The approach also extends the Git Re-Basin baseline to handle sign-flips and analyzes the role of scaling symmetries through latent-space visualizations and interpolation experiments. Overall, the method yields improved linear mode connectivity with linear-time inference and suggests a scalable path toward symmetry-aware canonicalization for broader architectures.

Abstract

Neural network parameterizations exhibit inherent symmetries that yield multiple equivalent minima within the loss landscape. Scale Graph Metanetworks (ScaleGMNs) explicitly leverage these symmetries by proposing an architecture equivariant to both permutation and parameter scaling transformations. Previous work by Ainsworth et al. (2023) addressed permutation symmetries through a computationally intensive combinatorial assignment problem, demonstrating that leveraging permutation symmetries alone can map networks into a shared loss basin. In this work, we extend their approach by also incorporating scaling symmetries, presenting an autoencoder framework utilizing ScaleGMNs as invariant encoders. Experimental results demonstrate that our method aligns Implicit Neural Representations (INRs) and Convolutional Neural Networks (CNNs) under both permutation and scaling symmetries without explicitly solving the assignment problem. This approach ensures that similar networks naturally converge within the same basin, facilitating model merging, i.e., smooth linear interpolation while avoiding regions of high loss. The code is publicly available on our GitHub repository.

Paper Structure

This paper contains 31 sections, 1 theorem, 8 equations, 9 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Suppose the activation functions of the network are either $\sin$ or $\tanh$. Then, for a fixed cost matrix $\boldsymbol{C}_{\ell}$, the optimal transformation $\boldsymbol{T}_{\ell} = \boldsymbol{P}^*_{\ell} \boldsymbol{Q}^*_{\ell}$ that maximizes the objective in Equation LAP_3 is given by:

Figures (9)

  • Figure 1: Autoencoder architecture for neural network canonicalization using a permutation and scaling-invariant ScaleGMN encoder, MLP decoder, and functional loss to preserve network equivalence.
  • Figure 2: Comparison between the ground truth (top row) and reconstruction obtained with the different types of INR autoencoders (bottom row), for a set of distinct MNIST digits.
  • Figure 3: Interpolation experiments comparing different neural network alignment methods across various perturbation scenarios.
  • Figure 4: Average interpolation curves (and standard deviation) over 20 pairs of distinct CNN models with ReLU activation.
  • Figure 5: Average interpolation curves (and standard deviation) over 20 pairs of distinct CNN models with Tanh activation.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof