Graph Metanetworks for Processing Diverse Neural Architectures
Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas
TL;DR
Graph Metanetworks (GMNs) tackle the challenge of treating neural networks as data by encoding input architectures as graphs and processing them with graph neural networks to respect parameter-space symmetries. The core ideas introduce Neural DAG Automorphisms to capture permutation symmetries and two graph representations, with parameter graphs offering a scalable, symmetry-preserving substrate for diverse modules, including attention and normalization. Theoretical results show GMNs are equivariant to input parameter permutations and as expressive as prior metanet approaches, while experiments demonstrate superior performance on tasks like predicting network accuracy, editing implicit neural representations (INRs), and self-supervised INRs across diverse architectures. This work enables robust, architecture-agnostic metanet analysis, with practical implications for metamodeling, model search, and analysis of state-of-the-art networks across CNNs, Transformers, INRs, and beyond.
Abstract
Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.
