Table of Contents
Fetching ...

Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks

Bart Kuipers, Freek Byrman, Daniel Uyterlinde, Alejandro García-Castellanos

TL;DR

The paper tackles efficient optimization of related neural networks by learning weight-space mappings that exploit shared structure through symmetry-aware metanetworks. It introduces ScaleGMN, a scale-equivariant graph metanetwork that converts networks into graphs (edge features encode weights, vertex features encode biases) and learns an operator $\hat{f}_{\boldsymbol{\phi}}: \mathcal{G} \times \boldsymbol{\Theta} \to \boldsymbol{\Theta}$ to produce updated parameters $\boldsymbol{\theta}'$ in one shot, guided by the objective $\mathcal{L}(\boldsymbol{\phi}; \boldsymbol{\theta}, \mathcal{B})$. A central theoretical contribution proves that the scaling gauge freedom of a CNN layer is strictly smaller than that of an MLP layer, explaining why symmetry-aware optimization yields larger gains for MLP-like inputs. Empirical results on Small CNN Zoo and Small MLP Zoo demonstrate effective single-shot optimization and informative symmetry-breaking analyses, with open-source code provided. This work advances symmetry-aware amortized optimization, offering a scalable and context-aware approach to efficient, generalizable neural-network optimization.

Abstract

Amortized optimization accelerates the solution of related optimization problems by learning mappings that exploit shared structure across problem instances. We explore the use of Scale Equivariant Graph Metanetworks (ScaleGMNs) for this purpose. By operating directly in weight space, ScaleGMNs enable single-shot fine-tuning of existing models, reducing the need for iterative optimization. We demonstrate the effectiveness of this approach empirically and provide a theoretical result: the gauge freedom induced by scaling symmetries is strictly smaller in convolutional neural networks than in multi-layer perceptrons. This insight helps explain the performance differences observed between architectures in both our work and that of Kalogeropoulos et al. (2024). Overall, our findings underscore the potential of symmetry-aware metanetworks as a powerful approach for efficient and generalizable neural network optimization. Open-source code: https://github.com/daniuyter/scalegmn_amortization

Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks

TL;DR

The paper tackles efficient optimization of related neural networks by learning weight-space mappings that exploit shared structure through symmetry-aware metanetworks. It introduces ScaleGMN, a scale-equivariant graph metanetwork that converts networks into graphs (edge features encode weights, vertex features encode biases) and learns an operator to produce updated parameters in one shot, guided by the objective . A central theoretical contribution proves that the scaling gauge freedom of a CNN layer is strictly smaller than that of an MLP layer, explaining why symmetry-aware optimization yields larger gains for MLP-like inputs. Empirical results on Small CNN Zoo and Small MLP Zoo demonstrate effective single-shot optimization and informative symmetry-breaking analyses, with open-source code provided. This work advances symmetry-aware amortized optimization, offering a scalable and context-aware approach to efficient, generalizable neural-network optimization.

Abstract

Amortized optimization accelerates the solution of related optimization problems by learning mappings that exploit shared structure across problem instances. We explore the use of Scale Equivariant Graph Metanetworks (ScaleGMNs) for this purpose. By operating directly in weight space, ScaleGMNs enable single-shot fine-tuning of existing models, reducing the need for iterative optimization. We demonstrate the effectiveness of this approach empirically and provide a theoretical result: the gauge freedom induced by scaling symmetries is strictly smaller in convolutional neural networks than in multi-layer perceptrons. This insight helps explain the performance differences observed between architectures in both our work and that of Kalogeropoulos et al. (2024). Overall, our findings underscore the potential of symmetry-aware metanetworks as a powerful approach for efficient and generalizable neural network optimization. Open-source code: https://github.com/daniuyter/scalegmn_amortization

Paper Structure

This paper contains 26 sections, 1 theorem, 25 equations, 5 figures, 5 tables.

Key Result

Lemma 1

Consider a neural network layer with n input features and m output features (a single feature map in the CNN case). Let $\mathcal{W}_{\mathrm{MLP}} = \mathbb{R}^{n \times m}$ be the space of weight matrices for a fully-connected layer (MLP), and let $\mathcal{W}_{\mathrm{CNN}} \subset \mathbb{R}^{n Then:

Figures (5)

  • Figure 1: Conceptual idea of our fully-amortized meta-optimizer for a low-dimensional cost function $\mathcal{C}$.
  • Figure 2: Training Dynamics. Validation loss curves comparing scale equivariant and symmetry-broken models across different architectures and $L_1$ regularization strengths (w/ tanh activation).
  • Figure 3: Illustration of edge feature representations for different edge types: (a) edges corresponding to convolutional kernels and (b) edges corresponding to weights in fully connected layers, with $w_{\text{max}} = h_{\text{max}} = 3$. This figure is inspired by Figure 6 in Appendix C of kofinas2024graph.
  • Figure 4: Visualization of the mapping from a CNN to its corresponding graph structure. The CNN illustrated is an example architecture from the Small CNN Zoo unterthiner2021predictingneuralnetworkaccuracy. Our methodology follows the approach described in kofinas2024graph for constructing graph representations of CNNs.
  • Figure 5: Accuracy Distribution. Comparison of the post–fine-tuning accuracy distributions for SGD and ScaleGMN. The boxplots depict results on the CNN-Tanh dataset with $\lambda=0$.

Theorems & Definitions (2)

  • Lemma 1
  • proof