Table of Contents
Fetching ...

Bootstrap tests for almost goodness-of-fit

Amparo Baíllo, Javier Cárcamo

TL;DR

This paper develops Almost Goodness-of-Fit (AGoF) tests to determine whether a parametric model approximates the true distribution within a pre-specified margin $\epsilon$, using the $L^p$ distance between the empirical distribution and a model representative $G(\boldsymbol{\theta}_F)$. It builds two bootstrap-consistent procedures to approximate the critical region for the test, grounded in a rigorous asymptotic theory via Hadamard differentiability and empirical process convergence in $L^p$. The authors derive the limit distributions, provide practical implementation steps, and demonstrate the method through simulations and real-data applications (Haiti IgG serosurvey and carbon-fiber failure stress), including a model-improvement metric $G(F,\mathcal{G})$ that quantifies the relative gain over a non-informative benchmark. The framework enables flexible, margin-based model validation and robust comparison across competing parametric families, with clear guidance on choosing the margin and interpreting results for equivalence-type hypotheses.

Abstract

We introduce the \textit{almost goodness-of-fit} test, a procedure to assess whether a (parametric) model provides a good representation of the probability distribution generating the observed sample. Specifically, given a distribution function $F$ and a parametric family $\mathcal{G}=\{ G(\boldsymbolθ) : \boldsymbolθ \in Θ\}$, we consider the testing problem \[ H_0: \| F - G(\boldsymbolθ_F) \|_p \geq ε\quad \text{vs} \quad H_1: \| F - G(\boldsymbolθ_F) \|_p < ε, \] where $ε>0$ is a margin of error and $G(\boldsymbolθ_F)$ denotes a representative of $F$ within the parametric class. The approximate model is determined via an M-estimator of the parameters. %The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value. The methodology also quantifies the percentage improvement of the proposed model relative to a non-informative (constant) benchmark. The test statistic is the $\mathrm{L}^p$-distance between the empirical distribution function and that of the estimated model. We present two consistent, easy-to-implement, and flexible bootstrap schemes to carry out the test. The performance of the proposal is illustrated through simulation studies and analysis and real-data applications.

Bootstrap tests for almost goodness-of-fit

TL;DR

This paper develops Almost Goodness-of-Fit (AGoF) tests to determine whether a parametric model approximates the true distribution within a pre-specified margin , using the distance between the empirical distribution and a model representative . It builds two bootstrap-consistent procedures to approximate the critical region for the test, grounded in a rigorous asymptotic theory via Hadamard differentiability and empirical process convergence in . The authors derive the limit distributions, provide practical implementation steps, and demonstrate the method through simulations and real-data applications (Haiti IgG serosurvey and carbon-fiber failure stress), including a model-improvement metric that quantifies the relative gain over a non-informative benchmark. The framework enables flexible, margin-based model validation and robust comparison across competing parametric families, with clear guidance on choosing the margin and interpreting results for equivalence-type hypotheses.

Abstract

We introduce the \textit{almost goodness-of-fit} test, a procedure to assess whether a (parametric) model provides a good representation of the probability distribution generating the observed sample. Specifically, given a distribution function and a parametric family , we consider the testing problem where is a margin of error and denotes a representative of within the parametric class. The approximate model is determined via an M-estimator of the parameters. %The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value. The methodology also quantifies the percentage improvement of the proposed model relative to a non-informative (constant) benchmark. The test statistic is the -distance between the empirical distribution function and that of the estimated model. We present two consistent, easy-to-implement, and flexible bootstrap schemes to carry out the test. The performance of the proposal is illustrated through simulation studies and analysis and real-data applications.

Paper Structure

This paper contains 15 sections, 6 theorems, 71 equations, 4 figures, 3 tables.

Key Result

Proposition 1

Let $\epsilon>0$ be fixed. For the testing problem AGoFspecific, the rejection region in c_alpha fulfills the following properties:

Figures (4)

  • Figure 1: Power function for (a) the Weibull(2,1) and the exponential model; (b) a normal mixture and the normal model and (c) a negative binomial and a Poisson model. The vertical red line is located at $\|F-G(\boldsymbol\theta_F)\|_p$.
  • Figure 2: Power function for (a) the Kumaraswamy(2,2) and the beta model; (b) the Student $t_4$ and the normal model; (c) the lognormal(0.5,0.5) and the gamma model. The vertical red line is located at $\|F-G(\boldsymbol\theta_F)\|_p$.
  • Figure 3: Histogram of log(MFI-bg) for antigen Bm33, normal fit and 2-component normal mixture fit.
  • Figure 4: For antigen Bm33, values of $\epsilon_k^*(0.05)$ when (a) $p=1$ and (b) $p=2$. Black points are the empirical $\mathrm{L}^p$-distances.

Theorems & Definitions (7)

  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Theorem 3
  • Corollary 2
  • proof