Graph Metanetworks for Processing Diverse Neural Architectures

Derek Lim; Haggai Maron; Marc T. Law; Jonathan Lorraine; James Lucas

Graph Metanetworks for Processing Diverse Neural Architectures

Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas

TL;DR

Graph Metanetworks (GMNs) tackle the challenge of treating neural networks as data by encoding input architectures as graphs and processing them with graph neural networks to respect parameter-space symmetries. The core ideas introduce Neural DAG Automorphisms to capture permutation symmetries and two graph representations, with parameter graphs offering a scalable, symmetry-preserving substrate for diverse modules, including attention and normalization. Theoretical results show GMNs are equivariant to input parameter permutations and as expressive as prior metanet approaches, while experiments demonstrate superior performance on tasks like predicting network accuracy, editing implicit neural representations (INRs), and self-supervised INRs across diverse architectures. This work enables robust, architecture-agnostic metanet analysis, with practical implications for metamodeling, model search, and analysis of state-of-the-art networks across CNNs, Transformers, INRs, and beyond.

Abstract

Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.

Graph Metanetworks for Processing Diverse Neural Architectures

TL;DR

Abstract

Paper Structure (49 sections, 12 theorems, 38 equations, 8 figures, 5 tables)

This paper contains 49 sections, 12 theorems, 38 equations, 8 figures, 5 tables.

Introduction
Graph Automorphism-based Metanets
Graph construction for general feedforward architectures
Computation graphs
Parameter graphs
Neural DAG automorphisms
Formulating Metanets as GNNs
Expressive Power of Graph Metanets (GMNs)
Related Work
Experiments
Predicting Accuracy for Varying Architectures
Editing 2D INRs
Self-Supervised Learning with INRs
Conclusion
Limitations
...and 34 more sections

Key Result

Proposition 1

For any neural DAG automorphism $\phi$ of a computation graph, the neural network function is left unchanged: $\forall {\boldsymbol{x}} \in \mathcal{X}, f_{{\boldsymbol{\theta}}}({\boldsymbol{x}}) = f_{\Phi({\boldsymbol{\theta}})}({\boldsymbol{x}})$.

Figures (8)

Figure 1: Overview of Graph Metanetworks (GMNs) Our method converts neural network architectures into a parameter graph where edges correspond to network parameters. The bias ($b$) and batch-normalization parameters are incorporated via additional nodes with edges to the relevant layer's neurons. The graph is processed by a graph neural network operating on edge attributes. Fixed-length (invariant) predictions can be extracted by pooling the output graph features.
Figure 2: An example computation graph for a network with a single convolutional layer. The layer has a $2.0\times2.0$ filter kernel, a single input and output channel, and applies the filter with a stride of 2. Even in this small case of a $4.0 \times 4.0$ input image, the graph has $16.0$ edges for only $4.0$ parameters.
Figure 3: Parameter subgraph constructions for assorted layers that we implemented in our empirical evaluation. Their descriptions are given in Section \ref{['sec:param_graph_construction']}. Further details are discussed in Appendix \ref{['appendix:graph_construction_details']}.
Figure 4: Examples of neural DAG automorphisms for linear layers, convolutional layers, and residual layers. Possible node permutations are illustrated using red and purple dashed arrows, the same color represents an identical transformation.
Figure 5: Histograms of CIFAR-10 accuracies for our Varying CNNs and Diverse Architectures datasets. Left and middle show train and test accuracy for the two datasets. Right shows test accuracy of Diverse Architectures split by model type.
...and 3 more figures

Theorems & Definitions (18)

Proposition 1
Proposition 2
Proposition 3
Proposition 4
Proposition 5
proof
Proposition 6
proof
Proposition 7
proof
...and 8 more

Graph Metanetworks for Processing Diverse Neural Architectures

TL;DR

Abstract

Graph Metanetworks for Processing Diverse Neural Architectures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (18)