Table of Contents
Fetching ...

General Bayesian Policy Learning

Masahiro Kato

TL;DR

This study proposes the General Bayes framework for policy learning, and shows that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter $\zeta>0$.

Abstract

This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice and portfolio selection. In such problems, the statistical target is a decision rule, and the prediction of each outcome $Y(a)$ is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss surrogate for welfare maximization. We show that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter $ζ>0$. This rewriting yields a General Bayes posterior over decision rules that admits a Gaussian pseudo-likelihood interpretation. We clarify two Bayesian interpretations of the resulting generalized posterior, a working Gaussian view and a decision-theoretic loss-based view. As one implementation example, we introduce neural networks with tanh-squashed outputs. Finally, we provide theoretical guarantees in a PAC-Bayes style.

General Bayesian Policy Learning

TL;DR

This study proposes the General Bayes framework for policy learning, and shows that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter .

Abstract

This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice and portfolio selection. In such problems, the statistical target is a decision rule, and the prediction of each outcome is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss surrogate for welfare maximization. We show that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter . This rewriting yields a General Bayes posterior over decision rules that admits a Gaussian pseudo-likelihood interpretation. We clarify two Bayesian interpretations of the resulting generalized posterior, a working Gaussian view and a decision-theoretic loss-based view. As one implementation example, we introduce neural networks with tanh-squashed outputs. Finally, we provide theoretical guarantees in a PAC-Bayes style.
Paper Structure (95 sections, 13 theorems, 74 equations, 6 figures, 6 tables)

This paper contains 95 sections, 13 theorems, 74 equations, 6 figures, 6 tables.

Key Result

Proposition 3.1

Assume that the normalizing constant is finite. Then the unique minimizer of ${\mathcal{J}}\left(Q\right)$ over all such $Q$ is $Q=\Pi_{\eta}\left(\cdot\mid {\mathcal{D}}\right)$.

Figures (6)

  • Figure 1: Synthetic binary welfare boxplots across $100$ trials for DGP1, DGP2, and DGP3.
  • Figure 2: GBPLNet posterior visualization in the one-dimensional binary example. The plot shows posterior draws of the score function $f_w\left(x\right)$, the posterior mean, and a pointwise $95\%$ credible band, along with the population target $\max\left(-1,\min\left(1,\tau\left(x\right)/\zeta\right)\right)$.
  • Figure 3: Posterior distribution of test welfare in the one-dimensional binary example. The histogram is computed from SGLD draws of $w$ and welfare evaluation under the induced deterministic policy.
  • Figure 4: Posterior distributions of $f_w\left(x_0\right)$ at five fixed covariate values $x_0\in\left\{-2,-1,0,1,2\right\}$ in the one-dimensional binary example. These histograms visualize local decision uncertainty around the boundary $f_w\left(x_0\right)=0$.
  • Figure 5: Synthetic $K=5$ welfare boxplots across $100$ trials for DGP1, DGP2, and DGP3.
  • ...and 1 more figures

Theorems & Definitions (28)

  • Proposition 3.1
  • Theorem 4.1
  • Theorem 5.1
  • Theorem 5.2
  • Theorem 7.1
  • Proposition 7.2
  • Theorem 8.1
  • Corollary 8.2
  • Corollary 8.3
  • Corollary 8.4
  • ...and 18 more