Table of Contents
Fetching ...

Testing Many Constraints in Possibly Irregular Models Using Incomplete U-Statistics

Nils Sturma, Mathias Drton, Dennis Leung

TL;DR

The problem of testing a null hypothesis defined by equality and inequality constraints on a statistical parameter, which is applicable when the constraints to be tested are polynomials in U-estimable parameters, is considered and goodness-of-fit tests of latent tree models for multivariate data are considered.

Abstract

We consider the problem of testing a null hypothesis defined by equality and inequality constraints on a statistical parameter. Testing such hypotheses can be challenging because the number of relevant constraints may be on the same order or even larger than the number of observed samples. Moreover, standard distributional approximations may be invalid due to irregularities in the null hypothesis. We propose a general testing methodology that aims to circumvent these difficulties. The constraints are estimated by incomplete U-statistics, and we derive critical values by Gaussian multiplier bootstrap. We show that the bootstrap approximation of incomplete U-statistics is valid for kernels that we call mixed degenerate when the number of combinations used to compute the incomplete U-statistic is of the same order as the sample size. It follows that our test controls type I error even in irregular settings. Furthermore, the bootstrap approximation covers high-dimensional settings making our testing strategy applicable for problems with many constraints. The methodology is applicable, in particular, when the constraints to be tested are polynomials in U-estimable parameters. As an application, we consider goodness-of-fit tests of latent tree models for multivariate data.

Testing Many Constraints in Possibly Irregular Models Using Incomplete U-Statistics

TL;DR

The problem of testing a null hypothesis defined by equality and inequality constraints on a statistical parameter, which is applicable when the constraints to be tested are polynomials in U-estimable parameters, is considered and goodness-of-fit tests of latent tree models for multivariate data are considered.

Abstract

We consider the problem of testing a null hypothesis defined by equality and inequality constraints on a statistical parameter. Testing such hypotheses can be challenging because the number of relevant constraints may be on the same order or even larger than the number of observed samples. Moreover, standard distributional approximations may be invalid due to irregularities in the null hypothesis. We propose a general testing methodology that aims to circumvent these difficulties. The constraints are estimated by incomplete U-statistics, and we derive critical values by Gaussian multiplier bootstrap. We show that the bootstrap approximation of incomplete U-statistics is valid for kernels that we call mixed degenerate when the number of combinations used to compute the incomplete U-statistic is of the same order as the sample size. It follows that our test controls type I error even in irregular settings. Furthermore, the bootstrap approximation covers high-dimensional settings making our testing strategy applicable for problems with many constraints. The methodology is applicable, in particular, when the constraints to be tested are polynomials in U-estimable parameters. As an application, we consider goodness-of-fit tests of latent tree models for multivariate data.
Paper Structure (26 sections, 24 theorems, 152 equations, 9 figures)

This paper contains 26 sections, 24 theorems, 152 equations, 9 figures.

Key Result

Theorem 2.4

Assume C1 - C6 hold. Then there is a constant $C_{\beta} > 0$ only depending on $\beta$ such that where $Y \sim N_p(0,m^2 \Gamma_g + \alpha_n \Gamma_h)$.

Figures (9)

  • Figure 1: Histograms of $5,000$ simulated $p$-values for simultaneously testing $2730$ tetrads constraints implied by the one-factor model with $l=15$ observed variables. The computational budget parameter for the incomplete $U$-statistic is $N=2n$ and the true covariance matrix is close to an irregular point, for exact parameter values see Section \ref{['sec:tree-models']}, setup (b).
  • Figure 2: Graphical representation of (i) the star tree and (ii) the binary caterpillar tree. Solid black dots correspond to leaves (observed variables).
  • Figure 3: Empirical sizes vs. nominal levels for testing tetrads based on $500$ experiments. The computational budget parameter $N$ is varied as indicated and empirical sizes of the LR test are also shown. Data is generated from setup (a) with $(l,n)=(15,500)$.
  • Figure 4: Empirical sizes vs. nominal levels for testing tetrads based on $500$ experiments. The computational budget parameter $N$ is varied as indicated and empirical sizes of the LR test are also shown. Data is generated from setups (b) and (c) with $(l,n)=(15,500)$.
  • Figure 5: Empirical power for different local alternatives based on $500$ experiments. The computational budget parameter $N$ is varied as indicated and empirical power of the LR test is also shown. Local alternatives are generated as described in the text for setup (a) with $(l,n)=(15,500)$ and level $\alpha=0.05$.
  • ...and 4 more figures

Theorems & Definitions (62)

  • Example 1.1
  • Definition 1.2
  • Example 1.3
  • Example 1.4
  • Remark 1.5: Nonparametric setups
  • Remark 1.6: Comparison to literature on shape restrictions
  • Definition 2.1
  • Remark 2.2: Discussion on mixed degeneracy
  • Remark 2.3: Parametric families and irregular points
  • Theorem 2.4
  • ...and 52 more