Alternators For Sequence Modeling

Mohammad Reza Rezaei; Adji Bousso Dieng

Alternators For Sequence Modeling

Mohammad Reza Rezaei, Adji Bousso Dieng

TL;DR

Alternators address the challenge of modeling time-dependent data with complex, non-Markovian dynamics by coupling two neural networks, the observation trajectory network (OTN) and the feature trajectory network (FTN), that alternately generate observations and latent features. The two networks are trained jointly by minimizing a cross-entropy objective over the joint trajectory distributions, yielding informative low-dimensional latent dynamics and high-quality sequence predictions. Across Lorenz attractor dynamics, neural decoding from brain activity, and sea-surface temperature forecasting, alternators outperform several strong baselines in trajectory fidelity and predictive tasks while offering faster sampling. This framework provides a versatile, interpretable alternative to diffusion/score-based models and high-dimensional latent-variable models, with practical impact for scientific time-series modeling and data-imputation tasks.

Abstract

This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively, over a cycle. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience, to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and often outperform strong baselines such as Mambas, neural ODEs, and diffusion models in the domains we studied.

Alternators For Sequence Modeling

TL;DR

Abstract

Paper Structure (9 sections, 14 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 9 sections, 14 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
Alternators
Related Work
Experiments
Model System: The Lorenz Attractor
Neural Decoding: Mapping Brain Activity To Movement
Sea-Surface Temperature Forecasting
Conclusion
Appendix

Figures (6)

Figure 1: Generative process of an alternator with a cycle of length $T=3$. An initial random feature ${\bm{z}}_0$ is generated from a fixed distribution, e.g. a standard Gaussian. The rest of the observations ${\bm{x}}_{1:T}$ and features ${\bm{z}}_{1:T}$ are generated by alternating between sampling from the OTN and the FTN, respectively.
Figure 2: Alternators are better at tracking the chaotic dynamics defined by a Lorenz attractor, especially during transitions between attraction points, than baselines such as VRNN, SRNN, NODE, and Mamba.
Figure 3: Alternators tend to outperform VRNN, SRNN, NODE, and Mamba on trajectory prediction in the neural decoding task on three different datasets in terms of MAE, MSE, and CC.
Figure 4: Alternators outperform VRNN, SRNN, NODE, and Mamba on forecasting in the neural decoding task on all three datasets in terms of MAE, MSE, and CC. The results are averaged across several forecasting settings, where we varied the forecasting rate from 10$\%$ to 50$\%$. The standard errors are shown as vertical bars.
Figure 5: Alternators tend to outperform VRNN, SRNN, NODE, and Mamba on missing value imputation in the neural decoding task on three datasets in terms of MAE, MSE, and CC. The results are averaged across several imputation settings, where we varied the missing value rate from 10$\%$ to 95$\%$. The standard errors are shown as vertical bars.
...and 1 more figures

Alternators For Sequence Modeling

TL;DR

Abstract

Alternators For Sequence Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (6)