Table of Contents
Fetching ...

Calibrating Bayesian Inference

Yang Liu, Jonathan P. Williams, Jan Hannig

Abstract

Bayesian statistics has gained popularity in psychological research due to its intuitive uncertainty quantification and convenient information-updating rules. In many applications, however, prior distributions are introduced merely as instruments to facilitate computation, rather than as representations of genuine subjective belief. Consequently, relying on standard Bayesian justifications for inferential procedures becomes conceptually ungrounded. In this paper, we recommend evaluating finite-sample performance over repeated sampling of data and parameters as an alternative justification for "pragmatic Bayes." We demonstrate a key vulnerability in the usual posterior-based inference: when analysts' chosen prior distribution mismatches the true parameter-generating process, Bayesian inference can be misleading. Given that this true process is rarely known in practice, we propose a safer alternative: calibrating Bayesian credible regions to achieve frequentist validity. This latter criterion is stronger and guarantees validity of Bayesian inference regardless of the underlying parameter-generating mechanism. To solve the calibration problem in practice, we propose a novel stochastic approximation algorithm. A Monte Carlo experiment is conducted and reported, in which we observe that uncalibrated Bayesian inference can be liberal under certain parameter-generating scenarios, whereas our calibrated solution consistently maintain validity. We also illustrate the proposed calibration procedure using a real-data example involving location-scale regression.

Calibrating Bayesian Inference

Abstract

Bayesian statistics has gained popularity in psychological research due to its intuitive uncertainty quantification and convenient information-updating rules. In many applications, however, prior distributions are introduced merely as instruments to facilitate computation, rather than as representations of genuine subjective belief. Consequently, relying on standard Bayesian justifications for inferential procedures becomes conceptually ungrounded. In this paper, we recommend evaluating finite-sample performance over repeated sampling of data and parameters as an alternative justification for "pragmatic Bayes." We demonstrate a key vulnerability in the usual posterior-based inference: when analysts' chosen prior distribution mismatches the true parameter-generating process, Bayesian inference can be misleading. Given that this true process is rarely known in practice, we propose a safer alternative: calibrating Bayesian credible regions to achieve frequentist validity. This latter criterion is stronger and guarantees validity of Bayesian inference regardless of the underlying parameter-generating mechanism. To solve the calibration problem in practice, we propose a novel stochastic approximation algorithm. A Monte Carlo experiment is conducted and reported, in which we observe that uncalibrated Bayesian inference can be liberal under certain parameter-generating scenarios, whereas our calibrated solution consistently maintain validity. We also illustrate the proposed calibration procedure using a real-data example involving location-scale regression.

Paper Structure

This paper contains 27 sections, 29 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Graphical illustration for posterior density and possibility contour. Left: Thick colored horizontal line segments are credible regions $C_\alpha(\mathbf{y})$ for $\alpha = .1$ (red), $.2$ (blue), and $.3$ (green). They are regions with 90%, 80%, and 70% posterior probabilities, depicted by the shaded area under the posterior density with matching colors. Right: The same three credible intervals are repositioned vertically to match their $\alpha$ levels. Stitching credible regions in the same fashion across all $\alpha$ levels yields the posterior possibility contour function.
  • Figure 2: Simulation summary: $m = 3$ design variables. Rows of the graphical table represent three parameter-generating scenarios (S1--S3). Columns represent six types of test statistics: the first two columns correspond to the Wald and posterior density ratio (PDR) statistics for simultaneous inference of all parameters, and the remaining four columns correspond to the marginal Wald statistics for selected parameters ($\beta_1$, $\beta_2$, $\gamma_1$, and $\gamma_2$). Six empirical distribution functions (EDFs) of posterior possibilities are presented in each panel. Colors are used to contrast results based on chi-square approximation (green), Markov chain Monte Carlo (MCMC) sampling (blue), and the proposed calibration algorithm (red). Line types are used to distinguish strong ($t_5(0, .5^2)$; solid) and weak ($t_5(0, 25^2)$; dashed) priors. The diagonal dotted lines in each panel indicates exact uniformity; a 95% normal-approximation, pointwise Monte Carlo confidence band is shown by the gray area. EDFs above the diagonal signifies liberal and thus invalid inference, while EDFs below the diagonal implies conservative and thus valid inference.
  • Figure 3: Simulation summary: $m = 10$ design variables. Rows of the graphical table represent three parameter-generating scenarios (S1--S3). Columns represent six types of test statistics: the first two columns correspond to the Wald and posterior density ratio (PDR) statistics for simultaneous inference of all parameters, and the remaining four columns correspond to the marginal Wald statistics for selected parameters ($\beta_1$, $\beta_2$, $\gamma_1$, and $\gamma_2$). Six empirical distribution functions (EDFs) of posterior possibilities are presented in each panel. Colors are used to contrast results based on chi-square approximation (green), Markov chain Monte Carlo (MCMC) sampling (blue), and the proposed calibration algorithm (red). Line types are used to distinguish strong ($t_5(0, .5^2)$; solid) and weak ($t_5(0, 25^2)$; dashed) priors. The diagonal dotted lines in each panel indicates exact uniformity; a 95% normal-approximation, pointwise Monte Carlo confidence band is shown by the gray area. EDFs above the diagonal signifies liberal and thus invalid inference, while EDFs below the diagonal implies conservative and thus valid inference.
  • Figure 4: Posterior possibilities for simultaneous inference. Results for the Wald and posterior density ratio (PDR) statistics are displayed in separate panels. Results based on the chi-square approximation, Markov chain Monte Carlo (MCMC) sampling, and calibration are displayed in green, blue, and red, respectively.
  • Figure 5: Posterior possibilities for marginal inference. Each panel corresponds to a single focal parameter. Results based on the chi-square approximation, Markov chain Monte Carlo (MCMC) sampling, and calibration are displayed in green, blue, and red, respectively. The vertical dashed line in each panel marks the maximum a posteriori estimate.