Table of Contents
Fetching ...

Navigating the Latent Space Dynamics of Neural Models

Marco Fumero, Luca Moschella, Emanuele Rodolà, Francesco Locatello

TL;DR

This work reframes autoencoders as dynamical systems on a latent manifold by defining a latent vector field $f(z)=E(D(z))$ and studying the discrete flow $z_{t+1}=f(z_t)$. Under typical regularization and architectural biases, $f$ tends to be contractive, producing attractors that capture the balance between memorization and generalization and reflect the learned data distribution. The authors connect the latent dynamics to the latent density $q(z)$, showing that trajectories ascend the score of $q$ and, when $J_f$ is symmetric, relate to a potential energy landscape, enabling an energy-based interpretation. They demonstrate practical benefits through data-free weight probing on vision foundation models and trajectory-based OOD detection, illustrating that attractors can serve as interpretable priors and that latent trajectories reveal distributional shifts. Overall, the latent dynamics framework provides a principled tool to analyze model behavior, diagnose generalization vs memorization, and probe weight-encoded knowledge without direct access to input data.

Abstract

Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower dimensional latent space. In this paper, we present an alternative interpretation of neural models as dynamical systems acting on the latent manifold. Specifically, we show that autoencoder models implicitly define a latent vector field on the manifold, derived by iteratively applying the encoding-decoding map, without any additional training. We observe that standard training procedures introduce inductive biases that lead to the emergence of attractor points within this vector field. Drawing on this insight, we propose to leverage the vector field as a representation for the network, providing a novel tool to analyze the properties of the model and the data. This representation enables to: (i) analyze the generalization and memorization regimes of neural models, even throughout training; (ii) extract prior knowledge encoded in the network's parameters from the attractors, without requiring any input data; (iii) identify out-of-distribution samples from their trajectories in the vector field. We further validate our approach on vision foundation models, showcasing the applicability and effectiveness of our method in real-world scenarios.

Navigating the Latent Space Dynamics of Neural Models

TL;DR

This work reframes autoencoders as dynamical systems on a latent manifold by defining a latent vector field and studying the discrete flow . Under typical regularization and architectural biases, tends to be contractive, producing attractors that capture the balance between memorization and generalization and reflect the learned data distribution. The authors connect the latent dynamics to the latent density , showing that trajectories ascend the score of and, when is symmetric, relate to a potential energy landscape, enabling an energy-based interpretation. They demonstrate practical benefits through data-free weight probing on vision foundation models and trajectory-based OOD detection, illustrating that attractors can serve as interpretable priors and that latent trajectories reveal distributional shifts. Overall, the latent dynamics framework provides a principled tool to analyze model behavior, diagnose generalization vs memorization, and probe weight-encoded knowledge without direct access to input data.

Abstract

Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower dimensional latent space. In this paper, we present an alternative interpretation of neural models as dynamical systems acting on the latent manifold. Specifically, we show that autoencoder models implicitly define a latent vector field on the manifold, derived by iteratively applying the encoding-decoding map, without any additional training. We observe that standard training procedures introduce inductive biases that lead to the emergence of attractor points within this vector field. Drawing on this insight, we propose to leverage the vector field as a representation for the network, providing a novel tool to analyze the properties of the model and the data. This representation enables to: (i) analyze the generalization and memorization regimes of neural models, even throughout training; (ii) extract prior knowledge encoded in the network's parameters from the attractors, without requiring any input data; (iii) identify out-of-distribution samples from their trajectories in the vector field. We further validate our approach on vision foundation models, showcasing the applicability and effectiveness of our method in real-world scenarios.

Paper Structure

This paper contains 32 sections, 9 theorems, 48 equations, 14 figures, 3 tables.

Key Result

Theorem 1

Let $F$ be a trained autoencoder and let $q(\mathbf{z}) = \int p(\mathbf{x}) q(\mathbf{z}|\mathbf{x}) d\mathbf{x}$ be the marginal distribution induced in latent space. Assume $q(\mathbf{z})$ is smooth and that there exists an open neighborhood $\Omega\supseteq\mathrm{supp}\,q$ and a constant $L<1$

Figures (14)

  • Figure 1: Latent dynamics of AEs. Latent vector fields induced by autoencoders with bottleneck $k=2$, trained on MNIST, with $\mathbf{z}_0 \sim \mathcal{U}[-8,8]$. Models with different initializations are shown. Colors (viridis colormap) represent vector norms ranging from violet (low) to yellow (high). The shape of the latent manifold identifies with the encoder's support. White regions indicate where the vector field vanishes, revealing attractors aligned with high-density areas of the data distribution.
  • Figure 2: Memorization vs Generalization. Attractors memorize the training data as a function of the rank of $J_f(\mathbf{z})$ by adjusting the bottleneck dimension $k$ (left) which is inversely proportional to the amount of generalization attained by the model (center); On the right we show example of attractors transitioning from a strong memorization model (first row) to good generalization (last row).
  • Figure 3: Latent vector field dynamics.(a) The 2D vector field ($k=2$) expands from a single attractor, eventually stabilizing and over-fitting because of capacity limits. Bottom: Evolution of larger capacity AEs ($k=128)$ across training. (b) Throughout training, the network first memorizes the data with a high memorization coefficient (in blue) and then generalizes, achieving a low test error (red). (c): Evolution of attractor count for training (blue), test (red), and noise (yellow) samples; (d) Attractors computed from training and from gaussian noise converge during training (green), while the separability of the trajectories (measured as FPR95, the lower the better) increase (purple).
  • Figure 4: Data-free weight information probing of Stable Diffusion model. We plot the error (MSE) vs sparsity (number of atoms) used to reconstruct samples from diverse dataset respectively from (i) an orthonormal random basis of the latent space (blue); (ii) attractors computed from gaussian noise (red), showing that attractors consistently reconstructs samples better on all datasets. (Right) Reconstructions using $5\%$ of the atoms on ImageNet
  • Figure 5: Trajectories in the latent vector field characterize distribution shifts We measure out-of-distribution detection performance on ViTMAE: On the left we report scores for 4 different datasets, highly outperforming the KNN baseline. On the right, histograms of scores on the INaturalist dataset, demonstrating much better separability between in-distribution and out-of-distribution.
  • ...and 9 more figures

Theorems & Definitions (16)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1: informal, proof in Appendix \ref{['app:proof_thm1']}
  • Proposition 3.1: informal, proof in Appendix \ref{['app:proof_prop_3_1']}
  • Proposition 3.2: Informal, proof in Appendix \ref{['app:attr_gen_err']}
  • Lemma A.5: Directional ascent of the residual field
  • proof
  • Theorem A.6: Convergence to latent-space modes
  • proof
  • ...and 6 more