Table of Contents
Fetching ...

LOO-PIT predictive model checking

Herman Tesso, Aki Vehtari

TL;DR

This work considers predictive checking for Bayesian model assessment using leave-one-out probability integral transform (LOO-PIT), and proves that this dependency is non-negligible in the finite case and depends on model complexity.

Abstract

We consider predictive checking for Bayesian model assessment using leave-one-out probability integral transform (LOO-PIT). LOO-PIT values are conditional cumulative predictive probabilities given LOO predictive distributions and corresponding left out observations. For a well-calibrated model, LOO-PIT values should be near uniformly distributed, but in the finite sample case they are not independent, due to LOO predictive distributions being determined by nearly the same data (all but one observation). We prove that this dependency is non-negligible in the finite case and depends on model complexity. We propose three testing procedures that can be used for continuous and discrete dependent uniform values. We also propose an automated graphical method for visualizing local departures from the null. Extensive numerical experiments on simulated and real datasets demonstrate that the proposed tests achieve competitive performance overall and have much higher power than standard uniformity tests based on the independence assumption that inevitably lead to lower than expected rejection rate.

LOO-PIT predictive model checking

TL;DR

This work considers predictive checking for Bayesian model assessment using leave-one-out probability integral transform (LOO-PIT), and proves that this dependency is non-negligible in the finite case and depends on model complexity.

Abstract

We consider predictive checking for Bayesian model assessment using leave-one-out probability integral transform (LOO-PIT). LOO-PIT values are conditional cumulative predictive probabilities given LOO predictive distributions and corresponding left out observations. For a well-calibrated model, LOO-PIT values should be near uniformly distributed, but in the finite sample case they are not independent, due to LOO predictive distributions being determined by nearly the same data (all but one observation). We prove that this dependency is non-negligible in the finite case and depends on model complexity. We propose three testing procedures that can be used for continuous and discrete dependent uniform values. We also propose an automated graphical method for visualizing local departures from the null. Extensive numerical experiments on simulated and real datasets demonstrate that the proposed tests achieve competitive performance overall and have much higher power than standard uniformity tests based on the independence assumption that inevitably lead to lower than expected rejection rate.
Paper Structure (33 sections, 40 equations, 11 figures, 3 tables)

This paper contains 33 sections, 40 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Graphical uniformity checks for LOO-PIT: (a) Method by sailynoja2022graphical: Binomial distribution-based 95% simultaneous confidence intervals for the ECDF values (y-axis). (b) Our method: We aggregate beta distribution-based pointwise tests on quantiles (x-axis) or binomial distribution-based pointwise tests on ranks (scaled y-axis) and provide automated color-coding representations that emphasize problematic parts of the ECDF. (c) Tilted ECDF, $\hat{F}(x)-x$, providing a centered visualization of (b).
  • Figure 2: Three visualizations depicting the same sample of $200$ PIT values. To asses uniformity of the sample, (left) the ECDF difference plot with 95% coverage for the ECDF shows the sample staying within the given limits. Meanwhile, the proposed color coding based graphical procedure detected issues with the sample and triggered identification of statistically suspicious parts of the ECDF (in red): (middle) $\gamma=0,\ \text{(right)}\ \gamma=\frac{\max\phi_i(v)}{2}$.
  • Figure 3: Visualizing Type I error performance for increasing #observations per group $m$. $k$ was fixed to $12$. (a) Posterior PIT (solid red line) shows increasing concordance with LOO-PIT as sample size grows larger.
  • Figure 4: Continuous model examples: Power performance evaluation of the tests under three scenarios for data distributions: Heavy-tailed, skewed and light-tailed. We always fit a normal model. x-axis shows true DGP.
  • Figure 5: Discrete model examples: Power performance evaluation on binomial and Poisson models. x-axis are true DGP. Our methods generally improve upon UPC, although PIET-C shows reduced power in the binomial case.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Definition : Influence Function
  • Definition : Shapley Value
  • proof : Pointwise PIT in terms of predictive CDF
  • proof : Equation \ref{['eq16']}
  • proof : Equation \ref{['shap']}