Table of Contents
Fetching ...

From sparse to dense functional data in high dimensions: Revisiting phase transitions from a non-asymptotic perspective

Shaojun Guo, Dong Li, Xinghao Qiao, Yizhu Wang

TL;DR

This work studies nonparametric mean and covariance estimation for high-dimensional, partially observed functional data using a unified local linear smoothing framework. It establishes non-asymptotic, generalized sub-Gaussian concentration bounds in both $L_2$ and supremum norms, derives exact elementwise maximum convergence rates, and reveals scaled phase transitions as the average sampling frequency per subject grows relative to $n$ and $\log p$. The results underpin FPCA-based procedures, including FPCA, sparse FPCA, and functional thresholding, by providing sharp rates for covariance estimation and eigenstructure recovery in high dimensions. Simulations corroborate the theory, showing phase-transition–like behavior across sparse, semi-dense, and ultra-dense regimes and illustrating the impact of $\log p$ on error rates. The framework extends previous asymptotic phase transition analyses to a non-asymptotic, high-dimensional setting, enabling rigorous guarantees for downstream functional data analysis in applications with many functional variables.

Abstract

Nonparametric estimation of the mean and covariance functions is ubiquitous in functional data analysis and local linear smoothing techniques are most frequently used. Zhang and Wang (2016) explored different types of asymptotic properties of the estimation, which reveal interesting phase transition phenomena based on the relative order of the average sampling frequency per subject $T$ to the number of subjects $n$, partitioning the data into three categories: "sparse", "semi-dense", and "ultra-dense". In an increasingly available high-dimensional scenario, where the number of functional variables $p$ is large in relation to $n$, we revisit this open problem from a non-asymptotic perspective by deriving comprehensive concentration inequalities for the local linear smoothers. Besides being of interest by themselves, our non-asymptotic results lead to elementwise maximum rates of $L_2$ convergence and uniform convergence serving as a fundamentally important tool for further convergence analysis when $p$ grows exponentially with $n$ and possibly $T$. With the presence of extra $\log p$ terms to account for the high-dimensional effect, we then investigate the scaled phase transitions and the corresponding elementwise maximum rates from sparse to semi-dense to ultra-dense functional data in high dimensions. We also discuss a couple of applications of our theoretical results. Finally, numerical studies are carried out to confirm the established theoretical properties.

From sparse to dense functional data in high dimensions: Revisiting phase transitions from a non-asymptotic perspective

TL;DR

This work studies nonparametric mean and covariance estimation for high-dimensional, partially observed functional data using a unified local linear smoothing framework. It establishes non-asymptotic, generalized sub-Gaussian concentration bounds in both and supremum norms, derives exact elementwise maximum convergence rates, and reveals scaled phase transitions as the average sampling frequency per subject grows relative to and . The results underpin FPCA-based procedures, including FPCA, sparse FPCA, and functional thresholding, by providing sharp rates for covariance estimation and eigenstructure recovery in high dimensions. Simulations corroborate the theory, showing phase-transition–like behavior across sparse, semi-dense, and ultra-dense regimes and illustrating the impact of on error rates. The framework extends previous asymptotic phase transition analyses to a non-asymptotic, high-dimensional setting, enabling rigorous guarantees for downstream functional data analysis in applications with many functional variables.

Abstract

Nonparametric estimation of the mean and covariance functions is ubiquitous in functional data analysis and local linear smoothing techniques are most frequently used. Zhang and Wang (2016) explored different types of asymptotic properties of the estimation, which reveal interesting phase transition phenomena based on the relative order of the average sampling frequency per subject to the number of subjects , partitioning the data into three categories: "sparse", "semi-dense", and "ultra-dense". In an increasingly available high-dimensional scenario, where the number of functional variables is large in relation to , we revisit this open problem from a non-asymptotic perspective by deriving comprehensive concentration inequalities for the local linear smoothers. Besides being of interest by themselves, our non-asymptotic results lead to elementwise maximum rates of convergence and uniform convergence serving as a fundamentally important tool for further convergence analysis when grows exponentially with and possibly . With the presence of extra terms to account for the high-dimensional effect, we then investigate the scaled phase transitions and the corresponding elementwise maximum rates from sparse to semi-dense to ultra-dense functional data in high dimensions. We also discuss a couple of applications of our theoretical results. Finally, numerical studies are carried out to confirm the established theoretical properties.
Paper Structure (23 sections, 6 theorems, 145 equations, 2 figures)

This paper contains 23 sections, 6 theorems, 145 equations, 2 figures.

Key Result

Theorem 2

Suppose that Assumptions cond_subG-cond_kernel hold. For each $j \in [p],$ let $\gamma_{n,T,h,j}=n (1 \wedge \widebar{T}_{\mu,j} h_{\mu,j})$ with the corresponding average sampling frequency per subject $\widebar{T}_{\mu,j} = n^{-1} \sum_{i=1}^n T_{ij},$ then there exist some positive constants $c_1 where $\tilde{\mu}_j(u)$ is a deterministic univariate function that converges to $\mu_j(u)$ as $h_

Figures (2)

  • Figure 1: Plots of average MaxMISE (black) and AveMISE (red) against $T$ with $p=50$ (solid), $100$ (dashed) and $150$ (dotted) for mean estimators (left) and covariance estimators (right).
  • Figure 2: Plots of average $\log(\text{AveMISE})$ against $\log n$ (left) and average $\log(\text{MaxMISE})$ against $\log(n/\log p )$ (right) for mean estimators (top) and covariance estimators (bottom) with $p=50,100,150$ and $n=50,100,150,200,250.$ The colored dashed lines correspond to different values of $T$ ranging from 3 to 140, and the estimated slopes of the corresponding linear fits based on five points for $\log n$ or fifteen points for $\log(n/\log p)$ are also displayed. The slope of the black solid line presents the theoretical value -1/2 (with the intercept being irrelevant here).

Theorems & Definitions (11)

  • Remark 1
  • Theorem 2
  • Theorem 3
  • Remark 4
  • Remark 5
  • Theorem 6
  • Theorem 7
  • Remark 8
  • Remark 9
  • Proposition 10
  • ...and 1 more