Table of Contents
Fetching ...

Bayes with No Shame: Admissibility Geometries of Predictive Inference

Nicholas G. Polson, Daniel Zantedeschi

Abstract

Four distinct admissibility geometries govern sequential and distribution-free inference: Blackwell risk dominance over convex risk sets, anytime-valid admissibility within the nonnegative supermartingale cone, marginal coverage validity over exchangeable prediction sets, and Cesàro approachability (CAA) admissibility, which reaches the risk-set boundary via approachability-style arguments rather than explicit priors. We prove a criterion separation theorem: the four classes of admissible procedures are pairwise non-nested. Each geometry carries a different certificate of optimality: a supporting-hyperplane prior (Blackwell), a nonnegative supermartingale (anytime-valid), an exchangeability rank (coverage), or a Cesàro steering argument (CAA). Martingale coherence is necessary for Blackwell admissibility and necessary and sufficient for anytime-valid admissibility within e-processes, but is not sufficient for Blackwell admissibility and is not necessary for coverage validity or CAA-admissibility. All four criteria share a common optimization template (minimize Bayesian risk subject to a feasibility constraint), but the constraint sets operate over different spaces, partial orders, and performance metrics, making them geometrically incompatible. Admissibility is irreducibly criterion-relative.

Bayes with No Shame: Admissibility Geometries of Predictive Inference

Abstract

Four distinct admissibility geometries govern sequential and distribution-free inference: Blackwell risk dominance over convex risk sets, anytime-valid admissibility within the nonnegative supermartingale cone, marginal coverage validity over exchangeable prediction sets, and Cesàro approachability (CAA) admissibility, which reaches the risk-set boundary via approachability-style arguments rather than explicit priors. We prove a criterion separation theorem: the four classes of admissible procedures are pairwise non-nested. Each geometry carries a different certificate of optimality: a supporting-hyperplane prior (Blackwell), a nonnegative supermartingale (anytime-valid), an exchangeability rank (coverage), or a Cesàro steering argument (CAA). Martingale coherence is necessary for Blackwell admissibility and necessary and sufficient for anytime-valid admissibility within e-processes, but is not sufficient for Blackwell admissibility and is not necessary for coverage validity or CAA-admissibility. All four criteria share a common optimization template (minimize Bayesian risk subject to a feasibility constraint), but the constraint sets operate over different spaces, partial orders, and performance metrics, making them geometrically incompatible. Admissibility is irreducibly criterion-relative.
Paper Structure (50 sections, 23 theorems, 13 equations, 7 figures, 6 tables)

This paper contains 50 sections, 23 theorems, 13 equations, 7 figures, 6 tables.

Key Result

Lemma 3.1

Under Definitions def:sdp--def:riskset, $\mathcal{R}$ is convex.

Figures (7)

  • Figure 1: Risk set geometry for $|\Theta|=2$. The convex risk set $\mathcal{R}$ (shaded) maps each decision rule to a risk vector. The lower boundary $\partial_-\mathcal{R}$ (bold curve) contains all admissible rules. At an admissible point $r^*$, the supporting hyperplane (dashed line) identifies the prior $\Pi$ whose normal $\pi$ defines the Bayes problem that $r^*$ solves (Theorem \ref{['thm:supporting']}). Interior points are dominated.
  • Figure 2: Concrete risk set for Bernoulli log-loss prediction with $\Theta=\{0.3,\,0.7\}$, $n=10$. The Bayes predictive $\hat{p}_n^B=(S_n+\tfrac{1}{2})/(n+1)$ under $\mathrm{Beta}(\tfrac{1}{2},\tfrac{1}{2})$ and the Laplace predictive $\hat{p}_n^U=(S_n+1)/(n+2)$ under $\mathrm{Beta}(1,1)$ both lie on the lower boundary $\partial_-\mathcal{R}$; each is no-shame with respect to a different prior (dashed supporting hyperplanes). The plug-in MLE $\hat{p}_n^{\mathrm{pi}}=S_n/n$ lies in the interior: its risk vector is dominated because it assigns zero probability to events that occur with positive probability, producing infinite log-loss contributions.
  • Figure 3: Supermartingale cone for anytime-valid inference. An e-process $E_t$ starts at $E_0=1$ and must remain a nonnegative supermartingale under every $\mathbb{P}\in\mathcal{H}_0$; this defines the feasible cone $\mathcal{C}_{\mathrm{AV}}$. An admissible e-process (solid, red) is a nonnegative martingale within the cone (Theorem \ref{['thm:ramdas']}). A process that grows systematically faster (dotted, gray) violates the supermartingale condition under some $\mathbb{P}\in\mathcal{H}_0$ and is inadmissible. Stopping at any data-dependent time $\tau$ preserves type-I error control at level $\alpha$ via Ville's inequality: the stopped value $E_\tau\le 1/\alpha$ with probability at least $1-\alpha$ (orange).
  • Figure 4: Coverage-feasible region for prediction sets. The feasibility constraint $\mathbb{P}(Y_{n+1}\in\hat{C}_n)\ge 1-\alpha$ defines the half-space $\mathcal{C}_{\mathrm{Cov}}$ (shaded, above the threshold). The conformal set $\hat{C}_n^{\mathrm{conf}}$ lies on the coverage frontier: it achieves exactly $1-\alpha$ marginal coverage with the minimum width achievable by exchangeability-based methods. An oracle Bayes interval $\hat{C}_n^{\mathrm{oracle}}$ optimized under the true $P_\theta$ can be shorter but may undercover; it lies below the threshold and is infeasible in $\mathcal{C}_{\mathrm{Cov}}$. A conservative set $\hat{C}_n^{\mathrm{wide}}$ overcovers but wastes width. Exact conditional coverage at every $x$ simultaneously is impossible for continuous distributions (Theorem \ref{['thm:conformal-impossibility']}).
  • Figure 5: Four admissibility geometries in diamond configuration. Each node represents an admissible class; dashed arrows indicate pairwise non-nesting (Theorems \ref{['thm:separation']} and \ref{['thm:extended-separation']}). Blackwell and CAA admissibility share the risk-set domain but differ in witness type (prior vs. fixed-point); anytime-valid and coverage admissibility operate on different procedure spaces entirely.
  • ...and 2 more figures

Theorems & Definitions (59)

  • Definition 2.1: Statistical decision problem
  • Definition 2.2: Decision rules
  • Definition 2.3: Risk function
  • Definition 2.4: Dominance and admissibility
  • Definition 2.5: Risk set
  • Lemma 3.1: Convexity
  • proof
  • Lemma 3.2: Existence of Bayes rules via Berge
  • proof
  • Proposition 3.3: Closedness
  • ...and 49 more