From latent dynamics to meaningful representations
Dedi Wang, Yihang Wang, Luke Evans, Pratyush Tiwary
TL;DR
This work tackles the challenge of learning meaningful latent representations when prior distributions over latents are unavailable. It introduces DynAE, a representation learner constrained solely by latent dynamics, enforcing overdamped Langevin behavior with a learnable transition density via a sliced-Wasserstein regularizer. The approach yields identifiability of the latent variables up to an isometry and demonstrates successful recovery of physically meaningful factors across diverse systems, including alanine dipeptide and Brownian DNA, while outperforming or matching dynamic baselines on higher-dimensional tasks. By shifting regularization from latent priors to transition dynamics and extending the sliced-Wasserstein auto-encoder to dynamics, the framework provides robust, disentangled representations with broad applicability to stochastic systems. This dynamics-centric perspective offers a practical path to recover ground-truth latent structure in complex datasets without strong prior assumptions about latent distributions.
Abstract
While representation learning has been central to the rise of machine learning and artificial intelligence, a key problem remains in making the learned representations meaningful. For this, the typical approach is to regularize the learned representation through prior probability distributions. However, such priors are usually unavailable or are ad hoc. To deal with this, recent efforts have shifted towards leveraging the insights from physical principles to guide the learning process. In this spirit, we propose a purely dynamics-constrained representation learning framework. Instead of relying on predefined probabilities, we restrict the latent representation to follow overdamped Langevin dynamics with a learnable transition density - a prior driven by statistical mechanics. We show this is a more natural constraint for representation learning in stochastic dynamical systems, with the crucial ability to uniquely identify the ground truth representation. We validate our framework for different systems including a real-world fluorescent DNA movie dataset. We show that our algorithm can uniquely identify orthogonal, isometric and meaningful latent representations.
