Table of Contents
Fetching ...

Adaptive functional principal components analysis

Sunny G. W. Wang, Valentin Patilea, Nicolas Klutchnikoff

TL;DR

This work introduces an adaptive functional principal components analysis framework that exploits replication to estimate local path regularity and selects per-eigen-element smoothing bandwidths via sharp risk bounds. By formulating a first smooth, then estimate pipeline with a diagonal bias-corrected covariance estimator, the method derives explicit, data-driven bandwidth rules that adapt to the local Hölder structure $H_t$ and $L_t$ and to the design regime. Theoretical results establish risk bounds and convergence rates for eigenvalues and eigenfunctions, with feasible plug-in bounds shown to preserve these rates. Numerical experiments, including a general purpose simulator and a real electricity consumption dataset, demonstrate substantial gains in accuracy and computational efficiency over existing FPCA approaches, with the accompanying FDAdapt package enabling practical deployment.

Abstract

Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic the characteristics of real data sets supports our methodological contribution. An illustration on a real data application is provided.

Adaptive functional principal components analysis

TL;DR

This work introduces an adaptive functional principal components analysis framework that exploits replication to estimate local path regularity and selects per-eigen-element smoothing bandwidths via sharp risk bounds. By formulating a first smooth, then estimate pipeline with a diagonal bias-corrected covariance estimator, the method derives explicit, data-driven bandwidth rules that adapt to the local Hölder structure and and to the design regime. Theoretical results establish risk bounds and convergence rates for eigenvalues and eigenfunctions, with feasible plug-in bounds shown to preserve these rates. Numerical experiments, including a general purpose simulator and a real electricity consumption dataset, demonstrate substantial gains in accuracy and computational efficiency over existing FPCA approaches, with the accompanying FDAdapt package enabling practical deployment.

Abstract

Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic the characteristics of real data sets supports our methodological contribution. An illustration on a real data application is provided.
Paper Structure (23 sections, 5 theorems, 79 equations, 9 figures, 2 tables)

This paper contains 23 sections, 5 theorems, 79 equations, 9 figures, 2 tables.

Key Result

Theorem 1

Let Assumptions ass_data and ass_ad_smo in Appendix sec_ap:ass hold true, and $\mathcal{H}_N$ be a bandwidth range as in Assumption ass_ad_smo. For $j\geq 2$, assume and $\lambda_1-\lambda_{2}>0$ when $j=1$. Then, uniformly over $\mathcal{H}_N$, Moreover, given a constant $C>0$, an integer $K_0$ depending on $C$ exists such that, for $\mathcal{B}_N(\widehat{\psi}_j;h)$ defined as in eq:efunction

Figures (9)

  • Figure 1: Simulation DGP: regularity parameters $H$ (left), $L$ (middle) and the variance function $\nu$ of $X$ (right)
  • Figure 2: Simulation DGP: mean $\mu$ (left), conditional standard deviation $\sigma$ (middle), Signal-to-Noise Ratio (right)
  • Figure 3: Simulation DGP: covariance function $\Gamma$ (left), the first two eigenfunctions (middle and right)
  • Figure 4: Our method compared to the one in fdapaceR: the ratio $\mathcal{R}_{ZW, 2}(\lambda_j)$ of the absolute errors of the eigenvalues estimates for $\sigma_0 = 0.25$, $b_{ZW, 2} = 0.1$, with different values $N$ (number of curves) and $\mathfrak{m}$ (average number of random design points along each curve). Results from 500 replications.
  • Figure 5: Our method compared to the one in fdapaceR: the ratio $\mathcal{R}_{ZW, 2}(\psi_j)$ of the $L^2-$norm errors of the eigenfunctions estimates. The same simulation setup as in Figure \ref{['fig:sigma_0.25_mfbm_0.1_val']}.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2
  • Definition 1
  • Theorem 3
  • proof : Proof of Theorem \ref{['thm:thm-1']}
  • proof : Proof of Corollary \ref{['corr_rates_est']}
  • proof : Proof of Corollary \ref{['corr_rates_est_cd']}
  • proof : Proof of Theorem \ref{['thm:thm-1-bis']}