Table of Contents
Fetching ...

On the Calibration of Bayesian Success Criteria and Operating Characteristics for Clinical Trials

Peng Yang, Li Wang, Ying Yuan

Abstract

Recently, the U.S. Food and Drug Administration (FDA) released draft guidance \citep{FDA2026} signaling a paradigm shift that facilitates the use of Bayesian methodology as the primary analysis and decision framework for drug approval. The cornerstone and fundamental challenge of this framework is the specification and calibration of Bayesian success criteria to control decision errors, ensuring reliable clinical and regulatory outcomes. In this work, we systematically investigate various Bayesian decision-error metrics, their theoretical interrelationships, and their alignment with conventional Frequentist counterparts. This investigation provides critical theoretical insights and practical guidance on calibrating Bayesian success criteria and operating characteristics to ensure robust decision-making and the integrity of public health decisions. We illustrate this framework using a clinical trial evaluating revascularization strategies for cardiogenic shock. A Shiny application will be available at www.trialdesign.org to assist sponsors and regulators in evaluating calibration strategies consistent with recent regulatory perspectives.

On the Calibration of Bayesian Success Criteria and Operating Characteristics for Clinical Trials

Abstract

Recently, the U.S. Food and Drug Administration (FDA) released draft guidance \citep{FDA2026} signaling a paradigm shift that facilitates the use of Bayesian methodology as the primary analysis and decision framework for drug approval. The cornerstone and fundamental challenge of this framework is the specification and calibration of Bayesian success criteria to control decision errors, ensuring reliable clinical and regulatory outcomes. In this work, we systematically investigate various Bayesian decision-error metrics, their theoretical interrelationships, and their alignment with conventional Frequentist counterparts. This investigation provides critical theoretical insights and practical guidance on calibrating Bayesian success criteria and operating characteristics to ensure robust decision-making and the integrity of public health decisions. We illustrate this framework using a clinical trial evaluating revascularization strategies for cardiogenic shock. A Shiny application will be available at www.trialdesign.org to assist sponsors and regulators in evaluating calibration strategies consistent with recent regulatory perspectives.
Paper Structure (30 sections, 7 theorems, 79 equations, 5 figures, 4 tables)

This paper contains 30 sections, 7 theorems, 79 equations, 5 figures, 4 tables.

Key Result

Proposition 1

The Bayesian power $\beta_B(c)$ admits the decomposition where $\gamma_1=\Pr(\theta>\delta)$ and $\gamma_0=1-\gamma_1$. In practical settings where $\alpha_B(c) < \beta_C(c)$, it follows that $\beta_B(c) \le \beta_C(c)$. Furthermore, if the design prior $\pi_{\mathrm{d}}(\theta)$ concentrates entirely on the effective region ($\gamma_1 \to 1$), then $\bet

Figures (5)

  • Figure 1: Operating characteristics for single-arm continuous endpoints across posterior probability cutoffs $c$, evaluated under three design priors: pessimistic ($\pi_{\mathrm{d},T}(\theta_T) = N(-0.1, 0.15^2)$), neutral ($\pi_{\mathrm{d},T}(\theta_T) = N(0, 0.15^2)$), and optimistic ($\pi_{\mathrm{d},T}(\theta_T) = N(0.1, 0.15^2)$), corresponding to $\gamma_1 = 0.252$, $0.5$, and $0.748$, respectively. The vertical dashed line indicates $c = 0.975$. The horizontal dashed lines in the error-rate and power panels indicate reference levels of $0.025$ and $0.80$, respectively. The trial design assumes a clinical margin of $\delta = 0$, a sample size of $n_T = 74$, variance $\sigma = 1$, and a non-informative analysis prior ($\pi_{\mathrm{a}, T}(\theta_T) = N(0, 10^6)$).
  • Figure 2: Difference between the probability of an incorrect decision (PID) and the Frequentist Type I error across posterior probability cutoffs $c = 0.90$, $0.95$, and $0.975$, under design priors with standard deviation equals $0.15$. The x-axis, $\gamma_1$, represents the prevalence of effective trials under the design prior. Vertical dashed lines indicate the threshold values of $\gamma_1$ at which the PID transitions from exceeding to falling below the Type I error rate.
  • Figure S1: Operating characteristics for single-arm continuous endpoints under a neutral design prior ($N(0, \sigma^2_{\mathrm{d}})$), evaluated for design-prior standard deviations $\sigma_{\mathrm{d}} = 0.10$, $0.15$, and $0.20$ (different colors). The vertical dashed line marks $c = 0.975$. Horizontal dashed lines indicate reference levels of $0.025$ (error-rate panels) and $0.80$ (power panels). The trial design assumes a clinical margin of $\delta = 0$, a sample size of $n_T = 74$, variance $\sigma = 1$, and a non-informative analysis prior ($\pi_{\mathrm{a},T}(\theta_T) = N(0,10^6)$).
  • Figure S2: Operating characteristics for single-arm binary endpoints across posterior probability cutoffs $c$, evaluated under three design priors: pessimistic ($\pi_{\mathrm{d}, T}(\theta_T) = \text{Beta}(3,12)$), neutral ($\pi_{\mathrm{d}, T}(\theta_T) = \text{Beta}(6,14)$), and optimistic ($\pi_{\mathrm{d}, T}(\theta_T) = \text{Beta}(9,14)$), corresponding to $\gamma_1 = 0.257$, $0.443$, and $0.652$, respectively. The vertical dashed line marks $c = 0.975$, and the horizontal dashed lines denote 0.025 in the error-rate panel and 0.80 in the power panel. The trial design assumes a clinical margin of $\delta = 0.3$, a sample size of $n_T = 74$, and a non-informative analysis prior ($\pi_{\mathrm{a}, T}(\theta_T) = \text{Beta}(1,1)$).
  • Figure S3: Operating characteristics for two-arm time-to-event endpoints across posterior probability cutoffs $c$, evaluated under three design priors: optimistic ($\pi_{\mathrm{d}}(\theta) = N(-0.1, 0.25^2)$), neutral ($\pi_{\mathrm{d}}(\theta) = N(0, 0.25^2)$), and pessimistic ($\pi_{\mathrm{d}}(\theta) = N(0.1, 0.25^2)$), corresponding to $\gamma_1 = 0.655$, $0.500$, and $0.345$, respectively. The vertical dashed line marks $c = 0.975$, and the horizontal dashed lines denote 0.025 in the error-rate panel and 0.80 in the power panel. The trial design assumes a clinical margin of $\delta = 0$ and a non-informative analysis prior ($\pi_{\mathrm{a}}(\theta) = N(0, 10^6)$).

Theorems & Definitions (14)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • Corollary 3.1
  • Theorem 2
  • Proposition 4
  • proof
  • proof
  • proof
  • ...and 4 more