Table of Contents
Fetching ...

Box Confidence Depth: simulation-based inference with hyper-rectangles

Elena Bortolato, Laura Ventura

TL;DR

Box-Confidence Depth (Box-CD) is a simulation-based frequentist framework that constructs calibrated, multivariate confidence regions by learning a depth over the parameter space from random hyper-rectangles in the summary-statistic space. The method uses a center-outward ordering via data depth and a simple acceptance rule based on whether the observed statistics lie inside the simulated boxes, yielding a depth function $\mathcal{CD}^{\text{box}}(\theta)$ from which confidence sets and point estimators can be read. The authors establish theoretical connections to confidence distributions, demonstrate invariance properties, discuss efficiency and optimality, and extend to high dimensions with an $S$-pseudo-sample generalization. Empirical studies across logistic regression, multivariate $t$, mixture, and Ricker's model show nominal coverage and competitive efficiency, with code and simulations openly available for replication and extension.

Abstract

This work presents a novel simulation-based approach for constructing confidence regions in parametric models, which is particularly suited for generative models and situations where limited data and conventional asymptotic approximations fail to provide accurate results. The method leverages the concept of data depth and depends on creating random hyper-rectangles, i.e. boxes, in the sample space generated through simulations from the model, varying the input parameters. A probabilistic acceptance rule allows to retrieve a Depth-Confidence Distribution for the model parameters from which point estimators as well as calibrated confidence sets can be read-off. The method is designed to address cases where both the parameters and test statistics are multivariate.

Box Confidence Depth: simulation-based inference with hyper-rectangles

TL;DR

Box-Confidence Depth (Box-CD) is a simulation-based frequentist framework that constructs calibrated, multivariate confidence regions by learning a depth over the parameter space from random hyper-rectangles in the summary-statistic space. The method uses a center-outward ordering via data depth and a simple acceptance rule based on whether the observed statistics lie inside the simulated boxes, yielding a depth function from which confidence sets and point estimators can be read. The authors establish theoretical connections to confidence distributions, demonstrate invariance properties, discuss efficiency and optimality, and extend to high dimensions with an -pseudo-sample generalization. Empirical studies across logistic regression, multivariate , mixture, and Ricker's model show nominal coverage and competitive efficiency, with code and simulations openly available for replication and extension.

Abstract

This work presents a novel simulation-based approach for constructing confidence regions in parametric models, which is particularly suited for generative models and situations where limited data and conventional asymptotic approximations fail to provide accurate results. The method leverages the concept of data depth and depends on creating random hyper-rectangles, i.e. boxes, in the sample space generated through simulations from the model, varying the input parameters. A probabilistic acceptance rule allows to retrieve a Depth-Confidence Distribution for the model parameters from which point estimators as well as calibrated confidence sets can be read-off. The method is designed to address cases where both the parameters and test statistics are multivariate.

Paper Structure

This paper contains 18 sections, 7 theorems, 30 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Lemma 3.1

For a scalar parameter $\theta$, under Assumption ASSstatistic, the Box-Confidence Depth is

Figures (5)

  • Figure 1: Two examples of summary statistics, $t^{*1} = (t^{*1}_1, t^{*1}_2)$ and $t^{*2} = (t^{*2}_1, t^{*2}_2)$ computed on simulated pseudo-samples. Left: the proposal parameter would be accepted as the observed $t^{\text{obs}}$ lies within the Box. Right: the proposal is rejected as $t^{\text{obs}}$ falls outside this Box.
  • Figure 2: Top panels: an instance of a monotone $p$-value function $F_t(t^{\text{obs}}|\theta)$ (left), $1-F_t(t^{\text{obs}}|\theta)$ (center), and their product (right). Bottom panels: a non a monotone $p$-value function where the resulting Confidence-Depth is multimodal.
  • Figure 3: Left: number of accepted parameters from 100000 proposals from the Mixture example, for varying sample size $n$ and number of pseudo-samples $S$. Right: ratio of accepted parameters to accepted with $S=2$.
  • Figure 4: Five replications of the same Box-CD function, with fixed $y^{\text{obs}}$ for the position parameter in the mixture model, with number of pseudo-samples $S$ varying. The vertical line indicates the true generating parameter $\theta_0=0.8$.
  • Figure 5: Two Monte-Carlo confidence regions for the parameters $\log(r)$ and $\sigma^2$ in the Ricker's model, the firts (left) containing the true generating parameter, the second (right) failing in including the paramer.

Theorems & Definitions (19)

  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 3.3
  • proof
  • ...and 9 more