Table of Contents
Fetching ...

Recursive Learning of Asymptotic Variational Objectives

Alessandro Mastrototaro, Mathias Müller, Jimmy Olsson

TL;DR

This work creates a particle-based framework for online VI in SSMs by approximating filter state posteriors and their derivatives using sequential Monte Carlo methods and provides rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.

Abstract

General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. SSMs, comprising latent Markovian states, can be subjected to variational inference (VI), but standard VI methods like the importance-weighted autoencoder (IWAE) lack functionality for streaming data. To enable online VI in SSMs when the observations are received in real time, we propose maximising an IWAE-type variational lower bound on the asymptotic contrast function, rather than the standard IWAE ELBO, using stochastic approximation. Unlike the recursive maximum likelihood method, which directly maximises the asymptotic contrast, our approach, called online sequential IWAE (OSIWAE), allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. By approximating filter state posteriors and their derivatives using sequential Monte Carlo (SMC) methods, we create a particle-based framework for online VI in SSMs. This approach is more theoretically well-founded than recently proposed online variational SMC methods. We provide rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.

Recursive Learning of Asymptotic Variational Objectives

TL;DR

This work creates a particle-based framework for online VI in SSMs by approximating filter state posteriors and their derivatives using sequential Monte Carlo methods and provides rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.

Abstract

General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. SSMs, comprising latent Markovian states, can be subjected to variational inference (VI), but standard VI methods like the importance-weighted autoencoder (IWAE) lack functionality for streaming data. To enable online VI in SSMs when the observations are received in real time, we propose maximising an IWAE-type variational lower bound on the asymptotic contrast function, rather than the standard IWAE ELBO, using stochastic approximation. Unlike the recursive maximum likelihood method, which directly maximises the asymptotic contrast, our approach, called online sequential IWAE (OSIWAE), allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. By approximating filter state posteriors and their derivatives using sequential Monte Carlo (SMC) methods, we create a particle-based framework for online VI in SSMs. This approach is more theoretically well-founded than recently proposed online variational SMC methods. We provide rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.

Paper Structure

This paper contains 24 sections, 25 theorems, 175 equations, 5 figures, 2 algorithms.

Key Result

Proposition 3.1

For all $M\in\mathbb{N}_{>0}$ there exist real-valued differentiable functions $\ell$ and $\ell^M$ on $\mathsf{\Theta}$ such that for all $\theta\in\mathsf{\Theta}$, $\mathbb{P}$-a.s.,

Figures (5)

  • Figure 1: Parameter estimation errors over time for SMC-OSIWAE, OVSMC, and RML in the scenario where $S_u = 0.2I$ and $S_v = 0.5I$. SMC-OSIWAE and RML used $N = 1000$ particles and $M = 1000$ importance samples, while OVSMC used $N = 10000$ particles to ensure comparable computational complexity. The proposal distribution $r_\theta$ (a 10-dimensional Gaussian distribution) was parameterised by two single-layer neural networks with 64 nodes each and ReLU activations and learned using $L = 5$ particles. The error bounds are based on 30 independent runs of each algorithm. With our implementation, SMC-OSIWAE took on average 42 min, OVSMC 26 min, and RML 46 min.
  • Figure 2: MSEs over time for OSIWAE, OVSMC, and RML with respect to the Kalman filter (executed for true parameters) for the linear Gaussian model with $S_u = 0.5I$ and $S_v = 0.2I$. The values are plotted as moving averages with a window of 3000 time steps. For all methods, the MSEs are based on 50 independent runs on the same data and different starting values of $A$ and $B$.
  • Figure 3: Average MAE of the estimated positions of $L = 8$ landmarks over time using OSIWAE, RML, and OVSMC in a SLAM scenario with $\sigma_{\text{motion}}^2 = 0.2$ and $\sigma_{\text{obs}}^2 = 0.1$. The proposal distribution $r_\theta(\cdot \mid x_t, y_{t+1})$ in both OSIWAE and OVSMC is learned via two distinct neural networks, each with one hidden layer of 128 nodes. All three methods use $N = 1000$ particles and OSIWAE uses $M = 1000$. Left panel: All three algorithms run on the same data, without any prior learning. Right panel: A training run is first performed using SMC-OSIWAE on a different data record to learn the proposal distribution; afterwards, all three algorithms are applied to the same data.
  • Figure 4: Log-densities of the learned proposal, the optimal kernel parameterised by the current parameter fit as well as the true parameters, and the prior kernel with true parameters. SMC-OSIWAE uses 1000 particles and $M = 1000$. The Gaussian proposal $r_{\theta}$ is parameterised by two distinct neural networks, with one hidden layer of 12 nodes each, modelling the mean and the variance of the same. In each plot, $x_t = 0.1$ and $y_{t+1} = 6$.
  • Figure 5: Mean absolute errors (MAEs) averaged over 10 runs for the estimated positions of $L = 8$ landmarks over time using OSIWAE, RML, and OVSMC in a SLAM scenario with motion noise variance $\sigma_{\text{motion}}^2 = 0.2$ and observation noise variance $\sigma_{\text{obs}}^2 = 0.1$. The dashed lines indicate the minimum and maximum MAE across all runs. The proposal distribution $r_\theta(\cdot \mid x_t, y_{t+1})$ in both OSIWAE and OVSMC is learned using two distinct neural networks, each with one hidden layer of 12 nodes. All three methods employ $N = 1000$ particles, and OSIWAE uses $M = 1000$ samples. Left panel: All three algorithms are run on the same data without any prior learning. Right panel: A training run is first performed using SMC-OSIWAE on a different dataset to learn the proposal distribution; subsequently, all three algorithms are applied to the same data. Each of the 10 runs is performed with the same observations and true landmark positions but with different initial landmark estimates.

Theorems & Definitions (54)

  • Proposition 3.1
  • Proposition 3.2
  • Theorem 3.3
  • Lemma 4.1
  • Remark A.3
  • Definition A.4
  • Definition A.5
  • Lemma A.6
  • proof
  • Remark A.7
  • ...and 44 more