Table of Contents
Fetching ...

Latent Matters: Learning Deep State-Space Models

Alexej Klushyn, Richard Kurle, Maximilian Soelch, Botond Cseke, Patrick van der Smagt

TL;DR

The extended Kalman VAE (EKVAE) is introduced, which combines amortised variational inference with classic Bayesian filtering/smoothing to model dynamics more accurately than RNN-based DSSMs.

Abstract

Deep state-space models (DSSMs) enable temporal predictions by learning the underlying dynamics of observed sequence data. They are often trained by maximising the evidence lower bound. However, as we show, this does not ensure the model actually learns the underlying dynamics. We therefore propose a constrained optimisation framework as a general approach for training DSSMs. Building upon this, we introduce the extended Kalman VAE (EKVAE), which combines amortised variational inference with classic Bayesian filtering/smoothing to model dynamics more accurately than RNN-based DSSMs. Our results show that the constrained optimisation framework significantly improves system identification and prediction accuracy on the example of established state-of-the-art DSSMs. The EKVAE outperforms previous models w.r.t. prediction accuracy, achieves remarkable results in identifying dynamical systems, and can furthermore successfully learn state-space representations where static and dynamic features are disentangled.

Latent Matters: Learning Deep State-Space Models

TL;DR

The extended Kalman VAE (EKVAE) is introduced, which combines amortised variational inference with classic Bayesian filtering/smoothing to model dynamics more accurately than RNN-based DSSMs.

Abstract

Deep state-space models (DSSMs) enable temporal predictions by learning the underlying dynamics of observed sequence data. They are often trained by maximising the evidence lower bound. However, as we show, this does not ensure the model actually learns the underlying dynamics. We therefore propose a constrained optimisation framework as a general approach for training DSSMs. Building upon this, we introduce the extended Kalman VAE (EKVAE), which combines amortised variational inference with classic Bayesian filtering/smoothing to model dynamics more accurately than RNN-based DSSMs. Our results show that the constrained optimisation framework significantly improves system identification and prediction accuracy on the example of established state-of-the-art DSSMs. The EKVAE outperforms previous models w.r.t. prediction accuracy, achieves remarkable results in identifying dynamical systems, and can furthermore successfully learn state-space representations where static and dynamic features are disentangled.
Paper Structure (38 sections, 46 equations, 24 figures, 7 tables, 1 algorithm)

This paper contains 38 sections, 46 equations, 24 figures, 7 tables, 1 algorithm.

Figures (24)

  • Figure 1: VHP-DKS (CO) on pendulum (image data). Learning the state-space representation with the CO framework. Distortion and rate are balanced by the Lagrange multiplier $\lambda$, which is updated (cf. Alg. \ref{['alg:seq-vhp-rewo']}) such that the model first improves the reconstruction quality/constraint by learning the rotation angle (see epoch 70). As soon as the constraint is satisfied, $\lambda$ decreases and the model starts learning the underlying dynamics, i.e. to represent the angular velocity. Robustness w.r.t. the hyperparameter $\mathcal{D}_0$ is demonstrated in App. \ref{['app:seq-vhp-results_1']} (Fig. \ref{['fig:seq-vhp-app_f1_2']}).
  • Figure 2: Pendulum (image data). In contrast to annealing (bottom), CO (middle & top) enables the model to learn the underlying dynamic system, as we verify in Table \ref{['tab:seq-vhp-si_pend']}. Furthermore, the VHP (top) significantly improves the quality of generated sequences. This is because the VHP learns a prior $p(\mathbf{z}_1)=\mathop{\mathbb{E}_{p(\boldsymbol{\zeta})}} [p(\mathbf{z}_1\vert\, \boldsymbol{\zeta})]$ that matches the manifold of $\mathop{\mathbb{E}_{p_\mathcal{D}}} [q(\mathbf{z}_1\vert\, \mathbf{x}_{1:T}, \mathbf{u}_{1:T})]$ (cf. first two columns of the latent-space visualisations).
  • Figure 3: VHP-EKVAE (CO) on reacher (RGB image data). Five-dimensional state-space representation (disentangled): the first three dimensions ($z_1, z_2, z_3$) represent the two joint angles, the last two dimensions ($z_4, z_5$) represent the respective angular velocities. The barrel shape indicates the model has learned that the first joint can do a 360 degree turn; whereas the second joint is restricted to avoid self-collisions (cf. Fig. \ref{['fig:seq-vhp-fig8']}).
  • Figure 4: VHP-KVAE (CO). The predictions show that the KVAE encodes the angular velocity of the pendulum in $\mathbf{h}_{t}=\text{LSTM}(\mathbf{a}_{1:t})$ and not in $\mathbf{z}_t$. This causes the poor smoothing-based predictions, as $\mathbf{h}_{1}=\text{LSTM}(\mathbf{a}_{1})$ does not have access to sequence data and therefore cannot infer the angular velocity.
  • Figure 5: VHP-EKVAE (CO) on pendulum (image data). Disentangled (position--velocity) state-space representations: $\mathbf{H}$ defines the latent dimensions where the model learns to encode the rotation angle and the angular velocity.
  • ...and 19 more figures