Table of Contents
Fetching ...

Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

Marco Federici, Patrick Forré, Ryota Tomioka, Bastiaan S. Veeling

TL;DR

The paper tackles the high cost of long-horizon Markov process simulation by learning latent representations that preserve dynamics at a chosen lag $\tau$ through a Time-lagged Information Bottleneck (T-IB) framework. It formalizes Latent Simulation (LS) to unfold trajectories in a latent space using encoders and variational transitions, and introduces autoinformation-based sufficiency to guarantee preservation of dynamics across timescales. By combining a non-linear, contrastive TI-Max objective with a bottleneck term, the method yields information-optimal representations that keep slow, relevant dynamics while discarding fast fluctuations, enabling accurate and dramatically faster latent simulations. Empirical results on synthetic slow-fast dynamics and molecular systems show that T-IB outperforms traditional linear or unregularized non-linear approaches in both representation quality and unfolded trajectory statistics, achieving substantial speedups over direct molecular dynamics and demonstrating practical impact for large-scale simulations.

Abstract

Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.

Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

TL;DR

The paper tackles the high cost of long-horizon Markov process simulation by learning latent representations that preserve dynamics at a chosen lag through a Time-lagged Information Bottleneck (T-IB) framework. It formalizes Latent Simulation (LS) to unfold trajectories in a latent space using encoders and variational transitions, and introduces autoinformation-based sufficiency to guarantee preservation of dynamics across timescales. By combining a non-linear, contrastive TI-Max objective with a bottleneck term, the method yields information-optimal representations that keep slow, relevant dynamics while discarding fast fluctuations, enabling accurate and dramatically faster latent simulations. Empirical results on synthetic slow-fast dynamics and molecular systems show that T-IB outperforms traditional linear or unregularized non-linear approaches in both representation quality and unfolded trajectory statistics, achieving substantial speedups over direct molecular dynamics and demonstrating practical impact for large-scale simulations.

Abstract

Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.
Paper Structure (63 sections, 3 theorems, 63 equations, 22 figures, 4 tables)

This paper contains 63 sections, 3 theorems, 63 equations, 22 figures, 4 tables.

Key Result

Lemma 1

Autoinformation and Sufficiency (proof in Appendix app:proof_suff) A representation ${\mathbf{z}}_t$ preserves autoinformation at lag time $\tau$ if and only if it is sufficient for any target ${\mathbf{y}}_{t+\tau}$. Conversely, whenever ${\mathbf{z}}_t$ does not preserve autoinformation for a lag

Figures (22)

  • Figure 1: The Time-lagged Information Bottleneck objective aims to maximize the mutual information between sampled representations ${\mathbf{z}}_{t-\tau}, {\mathbf{z}}_{t}$ at temporal distance $\tau$ while minimizing mismatch between the encoding distribution $p_\theta({\mathbf{z}}_{t}|{\mathbf{x}}_{t})$ and the learned variational transitional distribution $q_\phi({\mathbf{z}}_{t}|{\mathbf{z}}_{t-\tau})$. This results in minimal representations capturing dynamics at timescale $\tau$ or larger, which can be used to predict properties of interest ${\mathbf{y}}_t$, such as inter-atomic distances, over time.
  • Figure 2:
  • Figure 3:
  • Figure 5:
  • Figure 7:
  • ...and 17 more figures

Theorems & Definitions (12)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • proof
  • Remark 4
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 2 more