Table of Contents
Fetching ...

Online Variational Sequential Monte Carlo

Alessandro Mastrototaro, Jimmy Olsson

TL;DR

The paper tackles online parameter learning and online proposal adaptation for state-space models by extending variational sequential Monte Carlo (VSMC) to the streaming-data setting. It introduces OVSMC, an online stochastic-approximation of the VSMC ELBO gradient that jointly updates model parameters and an amortized particle-proposal kernel using two time-scale updates, while avoiding backward sampling and preserving linear memory in the number of particles. Theoretical contributions establish geometric ergodicity of the online scheme and convergence of the mean-field gradient via Robbins–Monro theory, with a sublinear rate $\mathcal{O}(\log t/\sqrt{t})$ under $\gamma_t^\theta \propto t^{-1/2}$. Empirically, OVSMC demonstrates fast online adaptation and competitive performance in online and batch settings across linear Gaussian, stochastic volatility, and deep generative video models, highlighting its practicality for streaming serial data.

Abstract

Being the most classical generative model for serial data, state-space models (SSM) are fundamental in AI and statistical machine learning. In SSM, any form of parameter learning or latent state inference typically involves the computation of complex latent-state posteriors. In this work, we build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference by combining particle methods and variational inference. While standard VSMC operates in the offline mode, by re-processing repeatedly a given batch of data, we distribute the approximation of the gradient of the VSMC surrogate ELBO in time using stochastic approximation, allowing for online learning in the presence of streams of data. This results in an algorithm, online VSMC, that is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation. In addition, we provide rigorous theoretical results describing the algorithm's convergence properties as the number of data tends to infinity as well as numerical illustrations of its excellent convergence properties and usefulness also in batch-processing settings.

Online Variational Sequential Monte Carlo

TL;DR

The paper tackles online parameter learning and online proposal adaptation for state-space models by extending variational sequential Monte Carlo (VSMC) to the streaming-data setting. It introduces OVSMC, an online stochastic-approximation of the VSMC ELBO gradient that jointly updates model parameters and an amortized particle-proposal kernel using two time-scale updates, while avoiding backward sampling and preserving linear memory in the number of particles. Theoretical contributions establish geometric ergodicity of the online scheme and convergence of the mean-field gradient via Robbins–Monro theory, with a sublinear rate under . Empirically, OVSMC demonstrates fast online adaptation and competitive performance in online and batch settings across linear Gaussian, stochastic volatility, and deep generative video models, highlighting its practicality for streaming serial data.

Abstract

Being the most classical generative model for serial data, state-space models (SSM) are fundamental in AI and statistical machine learning. In SSM, any form of parameter learning or latent state inference typically involves the computation of complex latent-state posteriors. In this work, we build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference by combining particle methods and variational inference. While standard VSMC operates in the offline mode, by re-processing repeatedly a given batch of data, we distribute the approximation of the gradient of the VSMC surrogate ELBO in time using stochastic approximation, allowing for online learning in the presence of streams of data. This results in an algorithm, online VSMC, that is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation. In addition, we provide rigorous theoretical results describing the algorithm's convergence properties as the number of data tends to infinity as well as numerical illustrations of its excellent convergence properties and usefulness also in batch-processing settings.
Paper Structure (24 sections, 9 theorems, 71 equations, 8 figures, 4 algorithms)

This paper contains 24 sections, 9 theorems, 71 equations, 8 figures, 4 algorithms.

Key Result

Proposition 4.3

Let Assumptions assum:ssm--assum:strongmixing hold. Then for every $\theta \in \Theta$, the canonical Markov chain $(Z_t^\theta)_{t \in \mathbb{N}}$ induced by $T_\theta$ is uniformly ergodic and admits a stationary distribution $\tau_{\theta}$.

Figures (8)

  • Figure 1: Parameter learning curves for the one-dimensional linear Gaussian SSM in Section \ref{['subsec:lg']}, obtained using algorithm \ref{['algo:ovpf']} with $L=5$ and $N=10000$ for $S_v=0.2$ (left) and $S_v=1.2$ (right). The means and the quantiles are calculated on the basis of 100 learning curves, each starting with a different initial value and based on independently generated observation data.
  • Figure 2: Comparisons of the bootstrap, locally optimal and learned proposals for different $(x_t, y_{t+1})$. Here OVSMC was run for 50000 iterations with $L=5$ and $N=10000$ for $S_v=0.2$ (top) and $S_v=1.2$ (bottom).
  • Figure 3: ELBO evolutions of VSMC and OVSMC (each running with $L=5$) for the multivariate linear Gaussian SSM in Section \ref{['subsec:lg']} (top: $B$ sparse; bottom: $B$ dense). In each plot, which corrsponds to a particular learning rate, five independent runs och each algorithm are displayed on top of each other and compared with the target log-likelihood.
  • Figure 4: Mean absolute errors estimating $A$ (left) and $B$ (right) of five independent runs of OVF, along with the distribution of 40 independent runs of OVSMC, with $L=5$ and $N=10^4$, proposal kernel as described in Section \ref{['subsec:lg']}, with 64 nodes in the hidden layer of each neural network, and ADAM learning rates $10^{-3}$. Here $d_x=d_y=10$, $S_u=0.1I$, and $S_v=0.25I$ and matrices $A$ and $B$ are diagonal with i.i.d. $\operatorname{Unif}(0.5,1)$-distributed elements.
  • Figure 5: Parameter learning curves obtained with OVSMC (with $L = 5$ and $N = 1000$) and PaRIS-based RML olsson:westerborn:2018 (with $N = 1000$), for the stochastic volatility SSM in Section \ref{['subsec:sv_rml']}, with learning rate $10^{-3}$.
  • ...and 3 more figures

Theorems & Definitions (16)

  • Proposition 4.3
  • Theorem 4.7
  • Corollary 4.8
  • Remark A.3
  • Proposition A.4: Proposition \ref{['prop:ergomain']}
  • proof
  • Proposition A.6
  • Theorem A.9: Theorem \ref{['thm:main']}
  • proof
  • Corollary A.10: Corollary \ref{['corollary:decay']}
  • ...and 6 more