Table of Contents
Fetching ...

Martingale Posterior Neural Networks for Fast Sequential Decision Making

Gerardo Duran-Martin, Leandro Sánchez-Betancourt, Álvaro Cartea, Kevin Murphy

TL;DR

The paper addresses the challenge of fast, uncertainty-aware sequential decision making with neural networks. It proposes a predictive-first framework based on martingale posteriors, updating the one-step-ahead predictive p_{b_t}(y_{t+1}|x_{t+1}) via Kalman-filter-like recursions while decoupling decision making from parameter-space inference. The authors introduce three scalable covariance strategies—HiLoFi, LoLoFi, and LRKF—that enable online, replay-free updates for million-parameter networks and demonstrate 10–100x faster inference than classical Thompson sampling with competitive or superior performance on non-stationary contextual bandits and Bayesian optimization. The approach scales to large architectures, maintains principled uncertainty quantification through the posterior predictive, and offers practical benefits for online decision-making tasks such as recommender systems and BO. Limitations include sensitivity to linearization, absence of fixed-lag smoothing, and hyperparameter choices, with future work aimed at fully online RL extensions.

Abstract

We introduce scalable algorithms for online learning of neural network parameters and Bayesian sequential decision making. Unlike classical Bayesian neural networks, which induce predictive uncertainty through a posterior over model parameters, our methods adopt a predictive-first perspective based on martingale posteriors. In particular, we work directly with the one-step-ahead posterior predictive, which we parameterize with a neural network and update sequentially with incoming observations. This decouples Bayesian decision-making from parameter-space inference: we sample from the posterior predictive for decision making, and update the parameters of the posterior predictive via fast, frequentist Kalman-filter-like recursions. Our algorithms operate in a fully online, replay-free setting, providing principled uncertainty quantification without costly posterior sampling. Empirically, they achieve competitive performance-speed trade-offs in non-stationary contextual bandits and Bayesian optimization, offering 10-100 times faster inference than classical Thompson sampling while maintaining comparable or superior decision performance.

Martingale Posterior Neural Networks for Fast Sequential Decision Making

TL;DR

The paper addresses the challenge of fast, uncertainty-aware sequential decision making with neural networks. It proposes a predictive-first framework based on martingale posteriors, updating the one-step-ahead predictive p_{b_t}(y_{t+1}|x_{t+1}) via Kalman-filter-like recursions while decoupling decision making from parameter-space inference. The authors introduce three scalable covariance strategies—HiLoFi, LoLoFi, and LRKF—that enable online, replay-free updates for million-parameter networks and demonstrate 10–100x faster inference than classical Thompson sampling with competitive or superior performance on non-stationary contextual bandits and Bayesian optimization. The approach scales to large architectures, maintains principled uncertainty quantification through the posterior predictive, and offers practical benefits for online decision-making tasks such as recommender systems and BO. Limitations include sensitivity to linearization, absence of fixed-lag smoothing, and hyperparameter choices, with future work aimed at fully online RL extensions.

Abstract

We introduce scalable algorithms for online learning of neural network parameters and Bayesian sequential decision making. Unlike classical Bayesian neural networks, which induce predictive uncertainty through a posterior over model parameters, our methods adopt a predictive-first perspective based on martingale posteriors. In particular, we work directly with the one-step-ahead posterior predictive, which we parameterize with a neural network and update sequentially with incoming observations. This decouples Bayesian decision-making from parameter-space inference: we sample from the posterior predictive for decision making, and update the parameters of the posterior predictive via fast, frequentist Kalman-filter-like recursions. Our algorithms operate in a fully online, replay-free setting, providing principled uncertainty quantification without costly posterior sampling. Empirically, they achieve competitive performance-speed trade-offs in non-stationary contextual bandits and Bayesian optimization, offering 10-100 times faster inference than classical Thompson sampling while maintaining comparable or superior decision performance.

Paper Structure

This paper contains 82 sections, 14 theorems, 126 equations, 19 figures, 5 tables, 4 algorithms.

Key Result

Proposition 4.1

At time $t$, the per-step approximation error of HiLoFi under the linearized SSM eq:ekf-measurement-model is bounded by where $\mathbf{E}_{{\bm h}, {\rm surr}} = (2\|\mathbf{K}_{{\bm h},t}\tilde{\mathbf{H}}_t\|_{\rm F} + \|\mathbf{K}_{{\bm h},t}\tilde{\mathbf{H}}_t\|_{\rm F}^2)$, $\mathbf{E}_{{\boldsymbol{\ell}}, {\rm surr}} = \|(\mathbf{I} - \mathbf{K}_{{\boldsymbol{\ell}}, t}\,\tilde{\mathbf{L}

Figures (19)

  • Figure 1: In-between uncertainty induced by HiLoFi as a function of the processed observations.
  • Figure 2: Cumulative average reward for the bandit MNIST problem.
  • Figure 3: Recommender system results. Left: average daily reward. Right: running time.
  • Figure 4: Left panel: Performance across all BO benchmark functions. For time, lower is better. For performance, higher is better. The dashed lines correspond to the results for our method, so methods that are above and to the left are better. Right panel: Time versus performance tradeoff plots.
  • Figure 5: Rolling one-step-ahead accuracy for the MNIST dataset.
  • ...and 14 more figures

Theorems & Definitions (27)

  • Proposition 4.1: Covariance approximation error
  • Remark 4.2
  • Proposition B.1
  • Proposition B.2
  • proof
  • Remark B.3
  • Corollary B.4: Kalman filter as a Bayesian posterior
  • Proposition C.1: QR of sum of Cholesky matrices
  • proof
  • Proposition C.2: SVD of sum of low-rank matrices
  • ...and 17 more