General Bayesian Policy Learning

Masahiro Kato

General Bayesian Policy Learning

Masahiro Kato

TL;DR

This study proposes the General Bayes framework for policy learning, and shows that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter $\zeta>0$.

Abstract

This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice and portfolio selection. In such problems, the statistical target is a decision rule, and the prediction of each outcome $Y(a)$ is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss surrogate for welfare maximization. We show that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter $ζ>0$. This rewriting yields a General Bayes posterior over decision rules that admits a Gaussian pseudo-likelihood interpretation. We clarify two Bayesian interpretations of the resulting generalized posterior, a working Gaussian view and a decision-theoretic loss-based view. As one implementation example, we introduce neural networks with tanh-squashed outputs. Finally, we provide theoretical guarantees in a PAC-Bayes style.

General Bayesian Policy Learning

TL;DR

Abstract

is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss surrogate for welfare maximization. We show that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter

. This rewriting yields a General Bayes posterior over decision rules that admits a Gaussian pseudo-likelihood interpretation. We clarify two Bayesian interpretations of the resulting generalized posterior, a working Gaussian view and a decision-theoretic loss-based view. As one implementation example, we introduce neural networks with tanh-squashed outputs. Finally, we provide theoretical guarantees in a PAC-Bayes style.

Paper Structure (95 sections, 13 theorems, 74 equations, 6 figures, 6 tables)

This paper contains 95 sections, 13 theorems, 74 equations, 6 figures, 6 tables.

Introduction
Contributions.
Setup
Background: Generalized Bayesian Updating
Decision-theoretic derivation.
Relation to ordinary Bayes.
GBPL with Binary Actions
Empirical Welfare Maximization
Squared-loss surrogate
Loss and Generalized Posterior
Pseudo-Likelihood Interpretation
General Bayes Interpretation
GBPL with Multiple Actions
Baseline-Gap Surrogate
Baseline-Free Symmetric Full-Vector Surrogate
...and 80 more sections

Key Result

Proposition 3.1

Assume that the normalizing constant is finite. Then the unique minimizer of ${\mathcal{J}}\left(Q\right)$ over all such $Q$ is $Q=\Pi_{\eta}\left(\cdot\mid {\mathcal{D}}\right)$.

Figures (6)

Figure 1: Synthetic binary welfare boxplots across $100$ trials for DGP1, DGP2, and DGP3.
Figure 2: GBPLNet posterior visualization in the one-dimensional binary example. The plot shows posterior draws of the score function $f_w\left(x\right)$, the posterior mean, and a pointwise $95\%$ credible band, along with the population target $\max\left(-1,\min\left(1,\tau\left(x\right)/\zeta\right)\right)$.
Figure 3: Posterior distribution of test welfare in the one-dimensional binary example. The histogram is computed from SGLD draws of $w$ and welfare evaluation under the induced deterministic policy.
Figure 4: Posterior distributions of $f_w\left(x_0\right)$ at five fixed covariate values $x_0\in\left\{-2,-1,0,1,2\right\}$ in the one-dimensional binary example. These histograms visualize local decision uncertainty around the boundary $f_w\left(x_0\right)=0$.
Figure 5: Synthetic $K=5$ welfare boxplots across $100$ trials for DGP1, DGP2, and DGP3.
...and 1 more figures

Theorems & Definitions (28)

Proposition 3.1
Theorem 4.1
Theorem 5.1
Theorem 5.2
Theorem 7.1
Proposition 7.2
Theorem 8.1
Corollary 8.2
Corollary 8.3
Corollary 8.4
...and 18 more

General Bayesian Policy Learning

TL;DR

Abstract

General Bayesian Policy Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (28)