Table of Contents
Fetching ...

Data-driven Smooth Tests for Normality in ANOVA When the Number of Groups is Large

Peiwen Jia, Xiaojun Song, Haoyu Wei

Abstract

The normality assumption for random errors is fundamental in the analysis of variance (ANOVA) models. However, it is rarely subjected to formal testing in practice, and theoretically justified procedures are largely unavailable, especially when the number of groups diverges. In this paper, we develop Neyman's smooth tests for assessing normality in a broad class of ANOVA models, allowing the number of groups to diverge. The proposed test statistics are constructed via the Gaussian probability integral transformation of ANOVA residuals. We show that using residuals induces non-negligible parameter estimation effects, whose structure depends on the underlying ANOVA model and plays a crucial role in shaping the form of the test statistics and their asymptotic behavior. Under the null hypothesis of normality, the resulting statistics follow an asymptotic Chi-square distribution, with degrees of freedom determined by the order of the smooth test (i.e., the number of components included in the smooth test). We further propose a modified Schwarz's selection rule to automatically determine the order, thereby yielding fully data-driven smooth tests that require no additional tuning parameters. Simulation studies and a real-data example indicate that the proposed tests perform well in practice and are readily applicable.

Data-driven Smooth Tests for Normality in ANOVA When the Number of Groups is Large

Abstract

The normality assumption for random errors is fundamental in the analysis of variance (ANOVA) models. However, it is rarely subjected to formal testing in practice, and theoretically justified procedures are largely unavailable, especially when the number of groups diverges. In this paper, we develop Neyman's smooth tests for assessing normality in a broad class of ANOVA models, allowing the number of groups to diverge. The proposed test statistics are constructed via the Gaussian probability integral transformation of ANOVA residuals. We show that using residuals induces non-negligible parameter estimation effects, whose structure depends on the underlying ANOVA model and plays a crucial role in shaping the form of the test statistics and their asymptotic behavior. Under the null hypothesis of normality, the resulting statistics follow an asymptotic Chi-square distribution, with degrees of freedom determined by the order of the smooth test (i.e., the number of components included in the smooth test). We further propose a modified Schwarz's selection rule to automatically determine the order, thereby yielding fully data-driven smooth tests that require no additional tuning parameters. Simulation studies and a real-data example indicate that the proposed tests perform well in practice and are readily applicable.

Paper Structure

This paper contains 20 sections, 12 theorems, 161 equations, 5 figures, 14 tables.

Key Result

Theorem 1

Suppose Assumptions assumption1 and assumption2 hold. Then, under the null hypothesis $H_0: e \sim \Phi(x)$, for $k=1,\ldots,K$, as $N\to\infty$.

Figures (5)

  • Figure 1: The sample means and error bars of $\widehat{K}$ in Experiment I
  • Figure 2: The sample means and error bars of $\widehat{K}$ in Experiment II
  • Figure 3: The sample means and error bars of $\widehat{K}$ in Experiment III
  • Figure 4: The sample means and error bars of $\widehat{K}$ in Experiment IV
  • Figure 5: The sample means and error bars of $\widehat{K}$ in Experiment III$^\prime$

Theorems & Definitions (14)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Corollary 2
  • ...and 4 more