Table of Contents
Fetching ...

Geometric Flow Models over Neural Network Weights

Ege Erdogan

TL;DR

This thesis addresses the challenge of learning generative models over neural network weights by explicitly incorporating the geometry and symmetries of weight space. It introduces three flow designs—Euclidean, Normalized, and Geometric—built on flow matching and powered by weight-space graph neural networks to transport priors to posteriors while respecting permutation and scaling symmetries. Empirical results across toy, small, and MNIST-scale tasks show that geometry-aware flows can generate high-quality weight samples with far fewer parameters and can transfer or be guided by task gradients, enabling effective Bayesian inference and learned initialization. The work argues that explicit geometric modeling yields more data-efficient and transferable weight-space representations, with clear paths for scaling, broader architectures, and deeper exploration of symmetry-driven priors and flows.

Abstract

Deep generative models such as flow and diffusion models have proven to be effective in modeling high-dimensional and complex data types such as videos or proteins, and this has motivated their use in different data modalities, such as neural network weights. A generative model of neural network weights would be useful for a diverse set of applications, such as Bayesian deep learning, learned optimization, and transfer learning. However, the existing work on weight-space generative models often ignores the symmetries of neural network weights, or only takes into account a subset of them. Modeling those symmetries, such as permutation symmetries between subsequent layers in an MLP, the filters in a convolutional network, or scaling symmetries arising with the use of non-linear activations, holds the potential to make weight-space generative modeling more efficient by effectively reducing the dimensionality of the problem. In this light, we aim to design generative models in weight-space that more comprehensively respect the symmetries of neural network weights. We build on recent work on generative modeling with flow matching, and weight-space graph neural networks to design three different weight-space flows. Each of our flows takes a different approach to modeling the geometry of neural network weights, and thus allows us to explore the design space of weight-space flows in a principled way. Our results confirm that modeling the geometry of neural networks more faithfully leads to more effective flow models that can generalize to different tasks and architectures, and we show that while our flows obtain competitive performance with orders of magnitude fewer parameters than previous work, they can be further improved by scaling them up. We conclude by listing potential directions for future work on weight-space generative models.

Geometric Flow Models over Neural Network Weights

TL;DR

This thesis addresses the challenge of learning generative models over neural network weights by explicitly incorporating the geometry and symmetries of weight space. It introduces three flow designs—Euclidean, Normalized, and Geometric—built on flow matching and powered by weight-space graph neural networks to transport priors to posteriors while respecting permutation and scaling symmetries. Empirical results across toy, small, and MNIST-scale tasks show that geometry-aware flows can generate high-quality weight samples with far fewer parameters and can transfer or be guided by task gradients, enabling effective Bayesian inference and learned initialization. The work argues that explicit geometric modeling yields more data-efficient and transferable weight-space representations, with clear paths for scaling, broader architectures, and deeper exploration of symmetry-driven priors and flows.

Abstract

Deep generative models such as flow and diffusion models have proven to be effective in modeling high-dimensional and complex data types such as videos or proteins, and this has motivated their use in different data modalities, such as neural network weights. A generative model of neural network weights would be useful for a diverse set of applications, such as Bayesian deep learning, learned optimization, and transfer learning. However, the existing work on weight-space generative models often ignores the symmetries of neural network weights, or only takes into account a subset of them. Modeling those symmetries, such as permutation symmetries between subsequent layers in an MLP, the filters in a convolutional network, or scaling symmetries arising with the use of non-linear activations, holds the potential to make weight-space generative modeling more efficient by effectively reducing the dimensionality of the problem. In this light, we aim to design generative models in weight-space that more comprehensively respect the symmetries of neural network weights. We build on recent work on generative modeling with flow matching, and weight-space graph neural networks to design three different weight-space flows. Each of our flows takes a different approach to modeling the geometry of neural network weights, and thus allows us to explore the design space of weight-space flows in a principled way. Our results confirm that modeling the geometry of neural networks more faithfully leads to more effective flow models that can generalize to different tasks and architectures, and we show that while our flows obtain competitive performance with orders of magnitude fewer parameters than previous work, they can be further improved by scaling them up. We conclude by listing potential directions for future work on weight-space generative models.

Paper Structure

This paper contains 67 sections, 38 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Overview of our weight-space flow. We aim to learn a flow in weight-space (1), processing neural networks with GNNs (2), using flow matching (3) and taking into account the symmetries of neural network weights (4). We propose three different flows (5) and potential use cases include Bayesian neural networks, learned weight initialization, or transfer learning (6).
  • Figure 2: Visualizing the exponential, logarithmic, and parallel transport maps on a manifold. The unique geodesic curve $\gamma$ between $p$ and $q$ with $\dot \gamma(0) = \mathbf{v}$ corresponds to the great circle of the sphere coinciding with the upper boundary in the diagrams.
  • Figure 3: Visual illustration of example permutation and scaling symmetries of neural networks.Top: Permuting the neurons at one layer and then applying the same permutation to the outgoing weights preserves the function being computed. Bottom: For a ReLU activation, multiplying the input with a non-negative constant and the output with its inverse preserves the function.
  • Figure 4: Linear mode connectivity. The hypothesis asserts that up to permutations, low-loss points in a neural network's loss landscape are linearly connected.
  • Figure 5: Weight-space learning. Neural network weights are processed using other neural neural networks for tasks such as regression (e.g. predicting the loss of unseen weights), classification (e.g. ), or generation (e.g. learned optimization).
  • ...and 10 more figures