Table of Contents
Fetching ...

Sample-efficient neural likelihood-free Bayesian inference of implicit HMMs

Sanmitra Ghosh, Paul J. Birrell, Daniela De Angelis

TL;DR

This work tackles Bayesian inference for implicit Hidden Markov Models with intractable likelihoods by widening neural likelihood-free inference (NLFI) to jointly infer the hidden-state path x and parameters θ. It introduces Incremental Density Estimation (IDE), which learns an autoregressive flow-based posterior over state paths conditioned on θ and y, enabling efficient ancestral sampling of x using importance sampling. The approach first obtains p(θ|y) via any NLFI method, then trains an IDE to model the true and approximate factors of p(x|θ,y), allowing accurate posterior predictive checks with far fewer simulations than bootstrap SMC or ABC-SMC. Across tractable and implicit HMMs (including LV and PKY), IDE yields near-SMC quality in hidden-state recovery and posterior predictive distributions, demonstrating substantial gains in sample efficiency and enabling robust goodness-of-fit assessment for implicit HMMs.

Abstract

Likelihood-free inference methods based on neural conditional density estimation were shown to drastically reduce the simulation burden in comparison to classical methods such as ABC. When applied in the context of any latent variable model, such as a Hidden Markov model (HMM), these methods are designed to only estimate the parameters, rather than the joint distribution of the parameters and the hidden states. Naive application of these methods to a HMM, ignoring the inference of this joint posterior distribution, will thus produce an inaccurate estimate of the posterior predictive distribution, in turn hampering the assessment of goodness-of-fit. To rectify this problem, we propose a novel, sample-efficient likelihood-free method for estimating the high-dimensional hidden states of an implicit HMM. Our approach relies on learning directly the intractable posterior distribution of the hidden states, using an autoregressive-flow, by exploiting the Markov property. Upon evaluating our approach on some implicit HMMs, we found that the quality of the estimates retrieved using our method is comparable to what can be achieved using a much more computationally expensive SMC algorithm.

Sample-efficient neural likelihood-free Bayesian inference of implicit HMMs

TL;DR

This work tackles Bayesian inference for implicit Hidden Markov Models with intractable likelihoods by widening neural likelihood-free inference (NLFI) to jointly infer the hidden-state path x and parameters θ. It introduces Incremental Density Estimation (IDE), which learns an autoregressive flow-based posterior over state paths conditioned on θ and y, enabling efficient ancestral sampling of x using importance sampling. The approach first obtains p(θ|y) via any NLFI method, then trains an IDE to model the true and approximate factors of p(x|θ,y), allowing accurate posterior predictive checks with far fewer simulations than bootstrap SMC or ABC-SMC. Across tractable and implicit HMMs (including LV and PKY), IDE yields near-SMC quality in hidden-state recovery and posterior predictive distributions, demonstrating substantial gains in sample efficiency and enabling robust goodness-of-fit assessment for implicit HMMs.

Abstract

Likelihood-free inference methods based on neural conditional density estimation were shown to drastically reduce the simulation burden in comparison to classical methods such as ABC. When applied in the context of any latent variable model, such as a Hidden Markov model (HMM), these methods are designed to only estimate the parameters, rather than the joint distribution of the parameters and the hidden states. Naive application of these methods to a HMM, ignoring the inference of this joint posterior distribution, will thus produce an inaccurate estimate of the posterior predictive distribution, in turn hampering the assessment of goodness-of-fit. To rectify this problem, we propose a novel, sample-efficient likelihood-free method for estimating the high-dimensional hidden states of an implicit HMM. Our approach relies on learning directly the intractable posterior distribution of the hidden states, using an autoregressive-flow, by exploiting the Markov property. Upon evaluating our approach on some implicit HMMs, we found that the quality of the estimates retrieved using our method is comparable to what can be achieved using a much more computationally expensive SMC algorithm.
Paper Structure (14 sections, 19 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 19 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: The process of using neural density estimators to approximate the joint posterior distribution $p( \boldsymbol{\theta}, \boldsymbol{x}| \boldsymbol{y})$ of a HMM. First, the HMM (simulator) is used to generate a training dataset $\{ \boldsymbol{\theta}^n, \boldsymbol{x}^n, \boldsymbol{y}^n\}_{n=1}^N$ (A), which is then used to train three neural density estimators (B). Training of the estimators of $\boldsymbol{\theta}$ happens sequentially through multiple rounds, generating more simulated data in the process. Once trained, given an observed time series $\boldsymbol{y}_{o}$, for each posterior sample of $\boldsymbol{\theta}$ drawn using its estimator, the approximate factor recursively generates (C) importance samples of the latent path. The hidden states are resampled from these importance samples using weights that are the ratio of the true and approximate factors (see Algorithm \ref{['alg:IDE pred']} for the pseudocode).
  • Figure 2: Estimation of the hidden states of a nonlinear state-space model. The quality of approximations was quantified using the MSE and $90 \%$ EC, summarised using the mean (solid line) and $95 \%$ confidence intervals (shaded area), across $10$ datasets.
  • Figure 3: Accuracy of parameter estimates for the Lotka-Volterra (a) and Prokaryotic autoregulator (b) models, assessed using the log probability of the true generative parameter vector, summarised across the $10$ datasets. The log probabilities were obtained by fitting a mixture of multivariate Gaussian densities to $500$ samples drawn from an estimate of $p( \boldsymbol{\theta}| \boldsymbol{y})$ obtained using each method.