Table of Contents
Fetching ...

Walking the Weight Manifold: a Topological Approach to Conditioning Inspired by Neuromodulation

Ari S. Benjamin, Kyle Daruwalla, Christian Pehle, Abdul-Malik Zekri, Anthony M. Zador

TL;DR

This work introduces weight manifolds as a neuromodulation-inspired mechanism for conditioning neural networks, where task context selects a point on a low-dimensional manifold in weight space rather than a single weight vector. It formalizes the optimization of an entire weight manifold via a variational loss $L[\mathcal{M}] = \int_0^1 \ell(\mathcal{M}(s,\mathbf{P})) ds$ under a bounded volumetric movement constraint, deriving a practical update $\Delta \mathbf{P} = -\frac{1}{2\lambda} \left[\int_0^1 \mathbf{M}(s) ds\right]^{-1} \int_0^1 \mathbf{g}(s) ds$ where $\mathbf{M}(s)$ and $\mathbf{g}(s)$ are the local metric and gradient. The framework supports analytic closed-form inverses for common manifolds (e.g., straight lines, ellipses), enabling efficient updates, and uses a basis-point decomposition to process batches without instantiating full weight matrices for every conditional input. Empirically, simple topologies like lines and ellipses implemented as weight manifolds can outperform traditional conditioning by input concatenation, and can generalize to unseen conditioning values (e.g., rotations of CIFAR-10) better than baselines; regularization experiments reveal when manifold conditioning helps and when mis-specification can limit benefits. Overall, the paper provides a principled, topology-aligned alternative to standard conditioning, with clear theoretical and practical pathways to richer topologies and conditioning inference in future work.

Abstract

One frequently wishes to learn a range of similar tasks as efficiently as possible, re-using knowledge across tasks. In artificial neural networks, this is typically accomplished by conditioning a network upon task context by injecting context as input. Brains have a different strategy: the parameters themselves are modulated as a function of various neuromodulators such as serotonin. Here, we take inspiration from neuromodulation and propose to learn weights which are smoothly parameterized functions of task context variables. Rather than optimize a weight vector, i.e. a single point in weight space, we optimize a smooth manifold in weight space with a predefined topology. To accomplish this, we derive a formal treatment of optimization of manifolds as the minimization of a loss functional subject to a constraint on volumetric movement, analogous to gradient descent. During inference, conditioning selects a single point on this manifold which serves as the effective weight matrix for a particular sub-task. This strategy for conditioning has two main advantages. First, the topology of the manifold (whether a line, circle, or torus) is a convenient lever for inductive biases about the relationship between tasks. Second, learning in one state smoothly affects the entire manifold, encouraging generalization across states. To verify this, we train manifolds with several topologies, including straight lines in weight space (for conditioning on e.g. noise level in input data) and ellipses (for rotated images). Despite their simplicity, these parameterizations outperform conditioning identical networks by input concatenation and better generalize to out-of-distribution samples. These results suggest that modulating weights over low-dimensional manifolds offers a principled and effective alternative to traditional conditioning.

Walking the Weight Manifold: a Topological Approach to Conditioning Inspired by Neuromodulation

TL;DR

This work introduces weight manifolds as a neuromodulation-inspired mechanism for conditioning neural networks, where task context selects a point on a low-dimensional manifold in weight space rather than a single weight vector. It formalizes the optimization of an entire weight manifold via a variational loss under a bounded volumetric movement constraint, deriving a practical update where and are the local metric and gradient. The framework supports analytic closed-form inverses for common manifolds (e.g., straight lines, ellipses), enabling efficient updates, and uses a basis-point decomposition to process batches without instantiating full weight matrices for every conditional input. Empirically, simple topologies like lines and ellipses implemented as weight manifolds can outperform traditional conditioning by input concatenation, and can generalize to unseen conditioning values (e.g., rotations of CIFAR-10) better than baselines; regularization experiments reveal when manifold conditioning helps and when mis-specification can limit benefits. Overall, the paper provides a principled, topology-aligned alternative to standard conditioning, with clear theoretical and practical pathways to richer topologies and conditioning inference in future work.

Abstract

One frequently wishes to learn a range of similar tasks as efficiently as possible, re-using knowledge across tasks. In artificial neural networks, this is typically accomplished by conditioning a network upon task context by injecting context as input. Brains have a different strategy: the parameters themselves are modulated as a function of various neuromodulators such as serotonin. Here, we take inspiration from neuromodulation and propose to learn weights which are smoothly parameterized functions of task context variables. Rather than optimize a weight vector, i.e. a single point in weight space, we optimize a smooth manifold in weight space with a predefined topology. To accomplish this, we derive a formal treatment of optimization of manifolds as the minimization of a loss functional subject to a constraint on volumetric movement, analogous to gradient descent. During inference, conditioning selects a single point on this manifold which serves as the effective weight matrix for a particular sub-task. This strategy for conditioning has two main advantages. First, the topology of the manifold (whether a line, circle, or torus) is a convenient lever for inductive biases about the relationship between tasks. Second, learning in one state smoothly affects the entire manifold, encouraging generalization across states. To verify this, we train manifolds with several topologies, including straight lines in weight space (for conditioning on e.g. noise level in input data) and ellipses (for rotated images). Despite their simplicity, these parameterizations outperform conditioning identical networks by input concatenation and better generalize to out-of-distribution samples. These results suggest that modulating weights over low-dimensional manifolds offers a principled and effective alternative to traditional conditioning.

Paper Structure

This paper contains 41 sections, 37 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: a. Traditional approaches map all conditions in task space to a single network in weight space. Here, conditioning corresponds to the noise added to an image (i.e. input uncertainty) in a classification task. b. Our approach maps various conditions to a parameterized manifold in weight space with known topology chosen to match the task topology. For the task in a, this corresponds to a line, but for alternative tasks like rotations of an input image, an ellipse is more appropriate. c. Manifolds are optimized via our proposed steepest descent rule such that they minimize the volumetric distance between steps. Here, we illustrate this by projecting an ellipse manifold of weights into principal component space during learning. The weights correspond to the first convolutional layer of a network trained to classify CIFAR-10 images.
  • Figure 2: a. An illustration of the training paradigm used to test the generalization abilities of our manifold approach. We sample a sparse subset of the possible conditions (rotation angles of the input) during training and test on the full set of conditions. b. Performance of the ellipse manifold network on the test set vs. a baseline network with and without conditioning.
  • Figure 3: a. Noised CIFAR-10 test accuracies for baseline networks and the line manifold. Noise procedure is described in Sec. \ref{['sec:regularization']}. b. A zoomed-in view of a.
  • Figure 4: a) We train an elliptical manifold in the space of weights of the same CNN architecture in the main manuscript on rotated CIFAR-10, conditioning on rotation angle by mapping it to ellipse phase. Interestingly, we find that Adam and AdamW do not show meaningful improvements over SGD with momentum. b) Manifolds can also be trained without conditioning on task variables. Here, we train an ellipse on CIFAR-10 using several optimizers, randomizing for each example which network on the manifold is chosen. Convergence accuracy is identical to training a point network. c) Here, we train on the identical task in panel b but using a ResNet18 architecture with LayerNorm. Manifolds and single points (i.e. standard training) perform similarly.