A Robbins--Monro Sequence That Can Exploit Prior Information For Faster Convergence
Siwei Liu, Ke Ma, Stephan M. Goetz
TL;DR
This work introduces a prior-information Robbins–Monro (PI–RM) sequence that integrates a target-point prior, $P_{x_t}(x)$, into the stochastic root-finding framework to accelerate convergence without requiring a regression model. Each PI–RM update combines the prior with a shrinking RM distribution: $x_{i+1}= ext{argmax}_x ig( P_{x_t}(x)\, (x ig| x_i-s_i(y_i-y_t), c_i^2) ig)$ with $c_i=c_0/i$, yielding faster early progress while preserving a.s. convergence for broad priors, including Gaussian, Gaussian mixtures, and KDE-derived priors. The authors provide convergence proofs for linear and nonlinear $f$ under Gaussian priors and extend results to practically arbitrary priors via weighted Gaussian sums, KDEs, and regularity assumptions, complemented by a thorough numerical study. The findings show notable early-term speedups, especially under high observation noise, and they offer a practical guideline for selecting the initial prior spread $c_0$. The approach broadens stochastic approximation by embedding prior information into RM iterations, with potential impact on fast root finding under limited measurements and noisy evaluations.
Abstract
We propose a new method to improve the convergence speed of the Robbins-Monro algorithm by introducing prior information about the target point into the Robbins-Monro iteration. We achieve the incorporation of prior information without the need of a -- potentially wrong -- regression model, which would also entail additional constraints. We show that this prior-information Robbins-Monro sequence is convergent for a wide range of prior distributions, even wrong ones, such as Gaussian, weighted sum of Gaussians, e.g., in a kernel density estimate, as well as bounded arbitrary distribution functions greater than zero. We furthermore analyse the sequence numerically to understand its performance and the influence of parameters. The results demonstrate that the prior-information Robbins-Monro sequence converges faster than the standard one, especially during the first steps, which are particularly important for applications where the number of function measurements is limited, and when the noise of observing the underlying function is large. We finally propose a rule to select the parameters of the sequence.
