Table of Contents
Fetching ...

Asymptotic and bootstrap tests for subspace dimension

Klaus Nordhausen, Hannu Oja, David E. Tyler

Abstract

Most linear dimension reduction methods proposed in the literature can be formulated using an appropriate pair of scatter matrices, see e.g. Ye and Weiss (2003), Tyler et al. (2009), Bura and Yang (2011), Liski et al. (2014) and Luo and Li (2016). The eigen-decomposition of one scatter matrix with respect to another is then often used to determine the dimension of the signal subspace and to separate signal and noise parts of the data. Three popular dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI) and sliced inverse regression (SIR) are considered in detail and the first two moments of subsets of the eigenvalues are used to test for the dimension of the signal space. The limiting null distributions of the test statistics are discussed and novel bootstrap strategies are suggested for the small sample cases. In all three cases, consistent test-based estimates of the signal subspace dimension are introduced as well. The asymptotic and bootstrap tests are compared in simulations and illustrated in real data examples.

Asymptotic and bootstrap tests for subspace dimension

Abstract

Most linear dimension reduction methods proposed in the literature can be formulated using an appropriate pair of scatter matrices, see e.g. Ye and Weiss (2003), Tyler et al. (2009), Bura and Yang (2011), Liski et al. (2014) and Luo and Li (2016). The eigen-decomposition of one scatter matrix with respect to another is then often used to determine the dimension of the signal subspace and to separate signal and noise parts of the data. Three popular dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI) and sliced inverse regression (SIR) are considered in detail and the first two moments of subsets of the eigenvalues are used to test for the dimension of the signal space. The limiting null distributions of the test statistics are discussed and novel bootstrap strategies are suggested for the small sample cases. In all three cases, consistent test-based estimates of the signal subspace dimension are introduced as well. The asymptotic and bootstrap tests are compared in simulations and illustrated in real data examples.

Paper Structure

This paper contains 26 sections, 12 theorems, 68 equations, 4 figures, 15 tables.

Key Result

Lemma 1

Under the stated assumptions and $H_{0q}$, $n T_q=n s^2(\widehat{\boldsymbol{S}}_{22}) +O_P(n^{-1/2}).$

Figures (4)

  • Figure 1: Left figure: The original data set consisting of the SVRI values measured on 223 subjects at 4 time points. Right figure: The estimated signal part (upper curves) and noise part (lower part) of the same data set.
  • Figure 2: The first three images in $\boldsymbol{Z}$ (upper row) and in $\widehat{\boldsymbol{Z}}$ (lower row).
  • Figure 3: Scatter plot matrix of the Australian athletes data.
  • Figure 4: Scatter plot matrix of the of the two selected SIR components against the response. Different plotting symbols have been used for men and women.

Theorems & Definitions (14)

  • Definition 1
  • Lemma 1
  • Theorem \oldthetheorem
  • Theorem \oldthetheorem
  • Corollary 1
  • Lemma 2
  • Theorem \oldthetheorem
  • Theorem \oldthetheorem
  • Corollary 2
  • Remark 1
  • ...and 4 more