Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics
Manuel Brenner, Florian Hess, Georgia Koppe, Daniel Durstewitz
TL;DR
The paper tackles reconstructing complex dynamical systems from multimodal, often non-Gaussian time series by introducing Multimodal Teacher Forcing (MTF), a framework that couples a multimodal variational autoencoder (MVAE) with a dendritic piecewise linear RNN (dendPLRNN) via shared decoders. MT F uses the MVAE to generate a sparse, data-informed teacher signal that guides training of the DSR model, yielding a fully generative latent dynamics that preserves geometry and long-term behavior. Across synthetic chaotic systems (Lorenz-63, Rössler, Lewis-Glass) and real neural data (fMRI+behavior, hippocampal spike trains with position), MT F outperforms competing strategies (SVAE, BPTT-based, and multiple shooting) and enables DS reconstruction from ordinal and symbolic data while handling missing modalities. The framework’s modularity and demonstrated success in cross-modal inference and symbolic dynamics suggest broad applicability to scientific domains where multimodal measurements are available but difficult to model jointly.
Abstract
Many, if not most, systems of interest in science are naturally described as nonlinear dynamical systems. Empirically, we commonly access these systems through time series measurements. Often such time series may consist of discrete random variables rather than continuous measurements, or may be composed of measurements from multiple data modalities observed simultaneously. For instance, in neuroscience we may have behavioral labels in addition to spike counts and continuous physiological recordings. While by now there is a burgeoning literature on deep learning for dynamical systems reconstruction (DSR), multimodal data integration has hardly been considered in this context. Here we provide such an efficient and flexible algorithmic framework that rests on a multimodal variational autoencoder for generating a sparse teacher signal that guides training of a reconstruction model, exploiting recent advances in DSR training techniques. It enables to combine various sources of information for optimal reconstruction, even allows for reconstruction from symbolic data (class labels) alone, and connects different types of observations within a common latent dynamics space. In contrast to previous multimodal data integration techniques for scientific applications, our framework is fully \textit{generative}, producing, after training, trajectories with the same geometrical and temporal structure as those of the ground truth system.
