Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks

Bart Kuipers; Freek Byrman; Daniel Uyterlinde; Alejandro García-Castellanos

Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks

Bart Kuipers, Freek Byrman, Daniel Uyterlinde, Alejandro García-Castellanos

TL;DR

The paper tackles efficient optimization of related neural networks by learning weight-space mappings that exploit shared structure through symmetry-aware metanetworks. It introduces ScaleGMN, a scale-equivariant graph metanetwork that converts networks into graphs (edge features encode weights, vertex features encode biases) and learns an operator $\hat{f}_{\boldsymbol{\phi}}: \mathcal{G} \times \boldsymbol{\Theta} \to \boldsymbol{\Theta}$ to produce updated parameters $\boldsymbol{\theta}'$ in one shot, guided by the objective $\mathcal{L}(\boldsymbol{\phi}; \boldsymbol{\theta}, \mathcal{B})$. A central theoretical contribution proves that the scaling gauge freedom of a CNN layer is strictly smaller than that of an MLP layer, explaining why symmetry-aware optimization yields larger gains for MLP-like inputs. Empirical results on Small CNN Zoo and Small MLP Zoo demonstrate effective single-shot optimization and informative symmetry-breaking analyses, with open-source code provided. This work advances symmetry-aware amortized optimization, offering a scalable and context-aware approach to efficient, generalizable neural-network optimization.

Abstract

Amortized optimization accelerates the solution of related optimization problems by learning mappings that exploit shared structure across problem instances. We explore the use of Scale Equivariant Graph Metanetworks (ScaleGMNs) for this purpose. By operating directly in weight space, ScaleGMNs enable single-shot fine-tuning of existing models, reducing the need for iterative optimization. We demonstrate the effectiveness of this approach empirically and provide a theoretical result: the gauge freedom induced by scaling symmetries is strictly smaller in convolutional neural networks than in multi-layer perceptrons. This insight helps explain the performance differences observed between architectures in both our work and that of Kalogeropoulos et al. (2024). Overall, our findings underscore the potential of symmetry-aware metanetworks as a powerful approach for efficient and generalizable neural network optimization. Open-source code: https://github.com/daniuyter/scalegmn_amortization

Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks

TL;DR

Abstract

Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)