Table of Contents
Fetching ...

Two-sample inference for sparse functional data

Chi Zhang, Peijun Sang, Yingli Qin

TL;DR

This work addresses two-sample mean function inference for sparse, irregularly sampled functional data without assuming a common covariance structure between groups. It develops an RKHS-based, smoothing-spline mean estimator for each group and derives a functional Bahadur representation to obtain pointwise limiting distributions and weak convergence. A bootstrap (multiplier) calibration approach is proposed to implement pointwise confidence intervals and a global two-sample test, with demonstrations through simulations and two real datasets (DTI FA profiles and Beijing PM2.5). The method shows favorable Type I error control and competitive or superior performance in both estimation and inference compared to existing approaches under varying sparsity and covariance settings.

Abstract

We propose a novel test procedure for comparing mean functions across two groups within the reproducing kernel Hilbert space (RKHS) framework. Our proposed method is adept at handling sparsely and irregularly sampled functional data when observation times are random for each subject. Conventional approaches, which are built upon functional principal components analysis, usually assume a homogeneous covariance structure across groups. Nonetheless, justifying this assumption in real-world scenarios can be challenging. To eliminate the need for a homogeneous covariance structure, we first develop a linear approximation for the mean estimator under the RKHS framework; this approximation is a sum of i.i.d. random elements, which naturally leads to the desirable pointwise limiting distributions. Moreover, we establish weak convergence for the mean estimator, allowing us to construct a test statistic for the mean difference. Our method is easily implementable and outperforms some conventional tests in controlling type I errors across various settings. We demonstrate the finite sample performance of our approach through extensive simulations and two real-world applications.

Two-sample inference for sparse functional data

TL;DR

This work addresses two-sample mean function inference for sparse, irregularly sampled functional data without assuming a common covariance structure between groups. It develops an RKHS-based, smoothing-spline mean estimator for each group and derives a functional Bahadur representation to obtain pointwise limiting distributions and weak convergence. A bootstrap (multiplier) calibration approach is proposed to implement pointwise confidence intervals and a global two-sample test, with demonstrations through simulations and two real datasets (DTI FA profiles and Beijing PM2.5). The method shows favorable Type I error control and competitive or superior performance in both estimation and inference compared to existing approaches under varying sparsity and covariance settings.

Abstract

We propose a novel test procedure for comparing mean functions across two groups within the reproducing kernel Hilbert space (RKHS) framework. Our proposed method is adept at handling sparsely and irregularly sampled functional data when observation times are random for each subject. Conventional approaches, which are built upon functional principal components analysis, usually assume a homogeneous covariance structure across groups. Nonetheless, justifying this assumption in real-world scenarios can be challenging. To eliminate the need for a homogeneous covariance structure, we first develop a linear approximation for the mean estimator under the RKHS framework; this approximation is a sum of i.i.d. random elements, which naturally leads to the desirable pointwise limiting distributions. Moreover, we establish weak convergence for the mean estimator, allowing us to construct a test statistic for the mean difference. Our method is easily implementable and outperforms some conventional tests in controlling type I errors across various settings. We demonstrate the finite sample performance of our approach through extensive simulations and two real-world applications.
Paper Structure (29 sections, 19 theorems, 237 equations, 17 figures, 6 tables, 1 algorithm)

This paper contains 29 sections, 19 theorems, 237 equations, 17 figures, 6 tables, 1 algorithm.

Key Result

Proposition 2.1

The solution to the minimization problem eq:mean_optimization can be expressed as for some coefficients $d_{g1}, d_{g2}, \ldots, d_{gk}$ and $c_{g11}, c_{g12}, \ldots, c_{gn_gN_{gn}}$.

Figures (17)

  • Figure 1: In every panel, the red dotted-dash line represents the nominal coverage probability. The actual pointwise coverage probabilities for $N_{\max} = 10$ and $N_{\max} = 18$ are represented by black solid and blue long-dashed lines, respectively. The number of subjects in group 2, denoted by $n_2$, with values of 100, 200, and 400, arranged from the top to the bottom.
  • Figure 2: Rejection rates across 1000 Monte Carlo runs at a 5% significance level with $N_{\max} = 10$ with $n_1 = 100$ under setting c.1.
  • Figure 3: Rejection rates across 1000 Monte Carlo runs at a 5% significance level with $N_{\max} = 10$ with $n_1 = 100$ under setting c.2.
  • Figure 4: Rejection rates across 1000 Monte Carlo runs at a 5% significance level with $N_{\max} = 10$ with $n_1 = 100$ under setting c.3.
  • Figure 5: The FA profiles along the CC tract for all participants are shown. The graph features a highlighted control subject represented by a red dot-dashed line and a case subject denoted by a blue dashed line.
  • ...and 12 more figures

Theorems & Definitions (37)

  • Proposition 2.1
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Lemma 3.5
  • Theorem 3.1: functional Bahadur representation
  • Theorem 3.2
  • Remark 3.1
  • Corollary 3.1
  • ...and 27 more