Table of Contents
Fetching ...

Non-Bayesian Learning in Misspecified Models

Sebastian Bervoets, Mathieu Faure, Ludovic Renou

TL;DR

This paper shows that in misspecified learning problems, a conservative, non_Bayesian updating rule can yield better predictive performance than standard Bayesian updating. By formulating the update as a Robbins–Monro algorithm and analyzing the associated ODE, the authors establish that the agent’s beliefs converge to a convex mixture of candidate models that maximizes the cross-entropy with respect to the true data-generating process, effectively learning the closest mixture to $p^*$ in KL terms. The key technical contribution is the Lyapunov structure provided by the cross-entropy $V(q)$ and the decomposition of the zero set $E$ into convex components, with a unique top component $C_{k^*}$ attracting the dynamics under mild conditions on the updating weights. The paper also discusses robustness to different weight schemes, contrasts with generalized Bayes, and comments on potential overreaction, linking the results to broader literature on learning under misspecification and information-theoretic decision rules. Overall, non_Bayesian updating can be advantageous in misspecified environments and offers computational or stability benefits when using restricted model families that approximate the truth well via mixtures.

Abstract

Deviations from Bayesian updating are traditionally categorized as biases, errors, or fallacies, thus implying their inherent ``sub-optimality.'' We offer a more nuanced view. We demonstrate that, in learning problems with misspecified models, non-Bayesian updating can outperform Bayesian updating.

Non-Bayesian Learning in Misspecified Models

TL;DR

This paper shows that in misspecified learning problems, a conservative, non_Bayesian updating rule can yield better predictive performance than standard Bayesian updating. By formulating the update as a Robbins–Monro algorithm and analyzing the associated ODE, the authors establish that the agent’s beliefs converge to a convex mixture of candidate models that maximizes the cross-entropy with respect to the true data-generating process, effectively learning the closest mixture to in KL terms. The key technical contribution is the Lyapunov structure provided by the cross-entropy and the decomposition of the zero set into convex components, with a unique top component attracting the dynamics under mild conditions on the updating weights. The paper also discusses robustness to different weight schemes, contrasts with generalized Bayes, and comments on potential overreaction, linking the results to broader literature on learning under misspecification and information-theoretic decision rules. Overall, non_Bayesian updating can be advantageous in misspecified environments and offers computational or stability benefits when using restricted model families that approximate the truth well via mixtures.

Abstract

Deviations from Bayesian updating are traditionally categorized as biases, errors, or fallacies, thus implying their inherent ``sub-optimality.'' We offer a more nuanced view. We demonstrate that, in learning problems with misspecified models, non-Bayesian updating can outperform Bayesian updating.

Paper Structure

This paper contains 13 sections, 12 theorems, 97 equations, 8 figures.

Key Result

Theorem 1

For all $q_0 \in \mathrm{int} \left(\mathbf{S}\right)$, the (random) limit set $\mathcal{L}((q_n)_n)$ of the process $(q_n)_n$ is contained in $C_{k^*}$, with probability one. In words, the beliefs converge to maximizers of the cross-entropy $V$.

Figures (8)

  • Figure 1: The causal models
  • Figure 2: The components in Example \ref{['ex:convex-components-maximal-number']}
  • Figure 3: The components in Example \ref{['ex:continuum-attractors']}
  • Figure 4: The four components in Example \ref{['ex:full-family']}
  • Figure 5: Constant weight and non-convergence
  • ...and 3 more figures

Theorems & Definitions (21)

  • Example 1
  • Theorem 1
  • Corollary 1
  • Example 2
  • Proposition 1
  • Example 3
  • Proposition 2
  • Theorem 2
  • Definition 1
  • Theorem 3
  • ...and 11 more