Table of Contents
Fetching ...

Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation

Zehao Li, Zhouchen Lin, Yijie Peng

TL;DR

This work introduces a ratio-free nested multi-time-scale stochastic approximation (NMTS) framework to address parameter inference in likelihood-free settings, where both the likelihood and its gradient are estimated by simulation. By coupling a fast-timescale gradient-tracker with a slow-timescale parameter (or variational) update, NMTS eliminates ratio bias and achieves faster, more stable convergence than single-timescale methods; it also provides strong convergence, weak convergence, and $\mathbb{L}^1$ rate results, with an explicit rate $O\left(\frac{\beta_k}{\alpha_k}+\sqrt{\frac{\alpha_k}{N}}\right)$ and an optimal scheduling yielding $O(k^{-1/3})$ MAE under appropriate choices. The framework extends to variational posterior inference and neural-network-based likelihood/posterior estimation, including two-network architectures trained at different time scales, and demonstrates improvements of one to two orders of magnitude in estimation accuracy at fixed computational budgets. Theoretical guarantees are complemented by numerical experiments on MLE and PDE tasks, plus toy and real-world neural-network implementations, underscoring NMTS’s efficiency and scalability for stochastic simulators. Overall, NMTS advances likelihood-free inference by delivering bias-free gradient estimation, rigorous convergence properties, and practical performance gains in complex, high-variance simulation settings.

Abstract

We study parameter inference in simulation-based stochastic models where the analytical form of the likelihood is unknown. The main difficulty is that score evaluation as a ratio of noisy Monte Carlo estimators induces bias and instability, which we overcome with a ratio-free nested multi-time-scale (NMTS) stochastic approximation (SA) method that simultaneously tracks the score and drives the parameter update. We provide a comprehensive theoretical analysis of the proposed NMTS algorithm for solving likelihood-free inference problems, including strong convergence, asymptotic normality, and convergence rates. We show that our algorithm can eliminate the original asymptotic bias $O\big(\sqrt{\frac{1}{N}}\big)$ and accelerate the convergence rate from $O\big(β_k+\sqrt{\frac{1}{N}}\big)$ to $O\big(\frac{β_k}{α_k}+\sqrt{\frac{α_k}{N}}\big)$, where $N$ is the fixed batch size, $α_k$ and $β_k$ are decreasing step sizes with $α_k$, $β_k$, $β_k/α_k\rightarrow 0$. With proper choice of $α_k$ and $β_k$, our convergence rates can match the optimal rate in the multi-time-scale SA literature. Numerical experiments demonstrate that our algorithm can improve the estimation accuracy by one to two orders of magnitude at the same computational cost, making it efficient for parameter estimation in stochastic systems.

Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation

TL;DR

This work introduces a ratio-free nested multi-time-scale stochastic approximation (NMTS) framework to address parameter inference in likelihood-free settings, where both the likelihood and its gradient are estimated by simulation. By coupling a fast-timescale gradient-tracker with a slow-timescale parameter (or variational) update, NMTS eliminates ratio bias and achieves faster, more stable convergence than single-timescale methods; it also provides strong convergence, weak convergence, and rate results, with an explicit rate and an optimal scheduling yielding MAE under appropriate choices. The framework extends to variational posterior inference and neural-network-based likelihood/posterior estimation, including two-network architectures trained at different time scales, and demonstrates improvements of one to two orders of magnitude in estimation accuracy at fixed computational budgets. Theoretical guarantees are complemented by numerical experiments on MLE and PDE tasks, plus toy and real-world neural-network implementations, underscoring NMTS’s efficiency and scalability for stochastic simulators. Overall, NMTS advances likelihood-free inference by delivering bias-free gradient estimation, rigorous convergence properties, and practical performance gains in complex, high-variance simulation settings.

Abstract

We study parameter inference in simulation-based stochastic models where the analytical form of the likelihood is unknown. The main difficulty is that score evaluation as a ratio of noisy Monte Carlo estimators induces bias and instability, which we overcome with a ratio-free nested multi-time-scale (NMTS) stochastic approximation (SA) method that simultaneously tracks the score and drives the parameter update. We provide a comprehensive theoretical analysis of the proposed NMTS algorithm for solving likelihood-free inference problems, including strong convergence, asymptotic normality, and convergence rates. We show that our algorithm can eliminate the original asymptotic bias and accelerate the convergence rate from to , where is the fixed batch size, and are decreasing step sizes with , , . With proper choice of and , our convergence rates can match the optimal rate in the multi-time-scale SA literature. Numerical experiments demonstrate that our algorithm can improve the estimation accuracy by one to two orders of magnitude at the same computational cost, making it efficient for parameter estimation in stochastic systems.

Paper Structure

This paper contains 26 sections, 27 theorems, 133 equations, 8 figures, 2 tables, 3 algorithms.

Key Result

Proposition 2

Under Assumption assumption1, the gradient estimator $\nabla \hat{L}_M(\lambda)$ converges to the true gradient uniformly with respect to $\lambda$: Furthermore, consider $\sqrt{M}(\nabla_{\lambda}\hat{L}_M(\lambda)-\nabla L(\lambda))$ as a stochastic process with respect to $\lambda$, it converges to a Gaussian process $G_P$ as $M$ tends to infinity: where the Gaussian process $G_P$ has mean ze

Figures (8)

  • Figure 1: Trajectories of NMTS and STS with different sample sizes based on 100 independent experiments
  • Figure 2: Log-log plot of the MAE of the estimators versus the iteration step $k$ of NMTS and STS algorithm based on 100 independent experiments
  • Figure 3: Trajectories of NMTS and STS with sample size $10^4$ based on 100 independent experiments
  • Figure 4: Log-log plot of the MAE of the estimators versus the iteration step $k$ of NMTS and STS algorithm based on 100 independent experiments when $N=10^3$
  • Figure 5: Posterior estimated by NMTS and STS through neural networks
  • ...and 3 more figures

Theorems & Definitions (28)

  • Proposition 2
  • Theorem 4
  • Proposition 5
  • Theorem 6
  • Remark 7
  • Proposition 8
  • Proposition 9
  • Proposition 11
  • Theorem 12
  • Lemma 13
  • ...and 18 more