Table of Contents
Fetching ...

Graph Metanetworks for Processing Diverse Neural Architectures

Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas

TL;DR

Graph Metanetworks (GMNs) tackle the challenge of treating neural networks as data by encoding input architectures as graphs and processing them with graph neural networks to respect parameter-space symmetries. The core ideas introduce Neural DAG Automorphisms to capture permutation symmetries and two graph representations, with parameter graphs offering a scalable, symmetry-preserving substrate for diverse modules, including attention and normalization. Theoretical results show GMNs are equivariant to input parameter permutations and as expressive as prior metanet approaches, while experiments demonstrate superior performance on tasks like predicting network accuracy, editing implicit neural representations (INRs), and self-supervised INRs across diverse architectures. This work enables robust, architecture-agnostic metanet analysis, with practical implications for metamodeling, model search, and analysis of state-of-the-art networks across CNNs, Transformers, INRs, and beyond.

Abstract

Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.

Graph Metanetworks for Processing Diverse Neural Architectures

TL;DR

Graph Metanetworks (GMNs) tackle the challenge of treating neural networks as data by encoding input architectures as graphs and processing them with graph neural networks to respect parameter-space symmetries. The core ideas introduce Neural DAG Automorphisms to capture permutation symmetries and two graph representations, with parameter graphs offering a scalable, symmetry-preserving substrate for diverse modules, including attention and normalization. Theoretical results show GMNs are equivariant to input parameter permutations and as expressive as prior metanet approaches, while experiments demonstrate superior performance on tasks like predicting network accuracy, editing implicit neural representations (INRs), and self-supervised INRs across diverse architectures. This work enables robust, architecture-agnostic metanet analysis, with practical implications for metamodeling, model search, and analysis of state-of-the-art networks across CNNs, Transformers, INRs, and beyond.

Abstract

Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.
Paper Structure (49 sections, 12 theorems, 38 equations, 8 figures, 5 tables)

This paper contains 49 sections, 12 theorems, 38 equations, 8 figures, 5 tables.

Key Result

Proposition 1

For any neural DAG automorphism $\phi$ of a computation graph, the neural network function is left unchanged: $\forall {\boldsymbol{x}} \in \mathcal{X}, f_{{\boldsymbol{\theta}}}({\boldsymbol{x}}) = f_{\Phi({\boldsymbol{\theta}})}({\boldsymbol{x}})$.

Figures (8)

  • Figure 1: Overview of Graph Metanetworks (GMNs) Our method converts neural network architectures into a parameter graph where edges correspond to network parameters. The bias ($b$) and batch-normalization parameters are incorporated via additional nodes with edges to the relevant layer's neurons. The graph is processed by a graph neural network operating on edge attributes. Fixed-length (invariant) predictions can be extracted by pooling the output graph features.
  • Figure 2: An example computation graph for a network with a single convolutional layer. The layer has a $2.0\times2.0$ filter kernel, a single input and output channel, and applies the filter with a stride of 2. Even in this small case of a $4.0 \times 4.0$ input image, the graph has $16.0$ edges for only $4.0$ parameters.
  • Figure 3: Parameter subgraph constructions for assorted layers that we implemented in our empirical evaluation. Their descriptions are given in Section \ref{['sec:param_graph_construction']}. Further details are discussed in Appendix \ref{['appendix:graph_construction_details']}.
  • Figure 4: Examples of neural DAG automorphisms for linear layers, convolutional layers, and residual layers. Possible node permutations are illustrated using red and purple dashed arrows, the same color represents an identical transformation.
  • Figure 5: Histograms of CIFAR-10 accuracies for our Varying CNNs and Diverse Architectures datasets. Left and middle show train and test accuracy for the two datasets. Right shows test accuracy of Diverse Architectures split by model type.
  • ...and 3 more figures

Theorems & Definitions (18)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • proof
  • Proposition 6
  • proof
  • Proposition 7
  • proof
  • ...and 8 more