Table of Contents
Fetching ...

Inverse set estimation and inversion of simultaneous confidence intervals

Junting Ren, Fabian J. E. Telschow, Armin Schwartzman

TL;DR

This work develops a finite-sample framework for inverse set estimation by inverting pre-built simultaneous confidence intervals (SCIs). By constructing inner and outer confidence sets for inverse upper, inverse lower, and inverse interval excursion sets, the authors guarantee exact coverage for all levels $c$ in $\mathbb R$ (via the SCI) or conservatively for finite level collections, thereby enabling non-asymptotic, level-flexible inference on $\mu^{-1}(U)$ even on non-dense domains. A non-parametric bootstrap SCI for regression, along with accompanying R code, extends the methodology to linear and logistic regression and to high-dimensional coefficient settings, with robust finite-sample performance demonstrated through comprehensive simulations. The approach is applied to climate risk mapping and to prediction uncertainty in COVID outcomes with statin use, illustrating how policymakers or clinicians can interpret regions or predictor settings that exceed specified thresholds with rigorous error control across many levels. Overall, the paper offers a broadly applicable, data-agnostic toolkit for simultaneous inverse-set inference that accommodates diverse data modalities while guarding against data peeking and multiple-threshold cherry-picking.

Abstract

Motivated by the questions of risk assessment in climatology (temperature change in North America) and medicine (impact of statin usage and COVID-19 on hospitalized patients), we address the problem of estimating the set in the domain of a function whose image equals a predefined subset. Existing methods that construct confidence sets require strict assumptions. We generalize the estimation of such sets to dense and non-dense domains with protection against "data peeking" by proving that confidence sets of multiple levels can be simultaneously constructed with the desired confidence non-asymptotically through inverting simultaneous confidence bands. A non-parametric bootstrap algorithm and code are provided.

Inverse set estimation and inversion of simultaneous confidence intervals

TL;DR

This work develops a finite-sample framework for inverse set estimation by inverting pre-built simultaneous confidence intervals (SCIs). By constructing inner and outer confidence sets for inverse upper, inverse lower, and inverse interval excursion sets, the authors guarantee exact coverage for all levels in (via the SCI) or conservatively for finite level collections, thereby enabling non-asymptotic, level-flexible inference on even on non-dense domains. A non-parametric bootstrap SCI for regression, along with accompanying R code, extends the methodology to linear and logistic regression and to high-dimensional coefficient settings, with robust finite-sample performance demonstrated through comprehensive simulations. The approach is applied to climate risk mapping and to prediction uncertainty in COVID outcomes with statin use, illustrating how policymakers or clinicians can interpret regions or predictor settings that exceed specified thresholds with rigorous error control across many levels. Overall, the paper offers a broadly applicable, data-agnostic toolkit for simultaneous inverse-set inference that accommodates diverse data modalities while guarding against data peeking and multiple-threshold cherry-picking.

Abstract

Motivated by the questions of risk assessment in climatology (temperature change in North America) and medicine (impact of statin usage and COVID-19 on hospitalized patients), we address the problem of estimating the set in the domain of a function whose image equals a predefined subset. Existing methods that construct confidence sets require strict assumptions. We generalize the estimation of such sets to dense and non-dense domains with protection against "data peeking" by proving that confidence sets of multiple levels can be simultaneously constructed with the desired confidence non-asymptotically through inverting simultaneous confidence bands. A non-parametric bootstrap algorithm and code are provided.
Paper Structure (26 sections, 6 theorems, 46 equations, 11 figures, 2 tables, 3 algorithms)

This paper contains 26 sections, 6 theorems, 46 equations, 11 figures, 2 tables, 3 algorithms.

Key Result

Proposition 1

For a fixed level $c \in \mathbb{R}$, and SCIs with $\alpha$ type I family-wiser error rate, we have

Figures (11)

  • Figure 1: Confidence sets for the increase of the mean summer temperature (June–August) in North America between the 20th and 21st centuries according to the specific climate model analyzed in sommerfeld2018confidence. Heat maps show the estimate of the mean difference. The first row displays the contours of the outer confidence sets, estimated inverse set, and the inner confidence sets, for various levels. The three plots in the second row display the confidence sets for the inverse sets, where the estimated mean difference is greater or equal to the individual level 1.5, 2.0, or 2.5 respectively. In the second row, the blue line is the contour of the outer confidence set, the green line is the contour of the estimated inverse set and the red line is the contour of the inner confidence set.
  • Figure 2: Simultaneous confidence set for the probability of severe outcome. We fixed other variables at ACE = 0, ARB = 0, sex = Male, CKD = 1, hypertension=1, CVD = 1, diabetes=1, obesity = 1. The gray shaded area is the 95% SCIs, the solid black line is the estimated probability. The red horizontal line shows the inner confidence sets (where the lower SCIs are greater than the corresponding level) which are contained in the estimated inverse upper excursion set colored as the green and red horizontal line (where the estimated means are greater than the corresponding levels); the outer confidence sets are colored by the blue, green and red line (where the upper SCIs are greater than the corresponding levels) and contain both the estimated inverse sets and the inner confidence sets.
  • Figure 3: 1D dense functional data simulation showcase. Demonstration of using SCB to find regions of $s$ where the true mean is greater than or equal to the three levels $0, 0.2, 0.8$ for 1D dense functional data. The gray shaded area is the 95% SCB, the solid black line is the true mean. The red horizontal line shows the inner confidence sets (where the lower SCB is greater than the corresponding level) that are contained in the true inverse set represented by the union of the green and red horizontal line (where the true mean is greater than the corresponding levels); the outer confidence sets are the union of the blue, green and red line (where the upper SCB is greater than the corresponding levels) and contain both the true inverse sets and the inner confidence sets.
  • Figure 4: 2D dense functional data simulation showcase. The first row displays the contours of the confidence sets in one single plot for the outer confidence sets, estimated inverse set and inner confidence sets, respectively. The three plots in the second row display the contours of the confidence sets for where the true mean is greater or equal to the individual level 0.3, 0.5 or 0.7 respectively. The blue line is the contour of the outer confidence set, the green line is the contour of the estimated inverse set and the red line is the contour of the inner confidence set.
  • Figure 5: Dense functional data simulation: coverage rate of confidence sets for different number of levels for inverse upper excursion sets. The dashed black line is 95% plus or minus twice the standard error for a Bernoulli random variable with $p = 0.95$ divided by $\sqrt{5000}$.
  • ...and 6 more figures

Theorems & Definitions (15)

  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Remark
  • Corollary 1
  • proof
  • Remark
  • Corollary 2
  • proof
  • ...and 5 more