Adaptive estimation of irregular mean and covariance functions
Steven Golovkine, Nicolas Klutchnikoff, Valentin Patilea
TL;DR
This paper addresses nonparametric estimation of the mean and covariance functions for functional data observed with measurement error at discrete times under unknown local regularity. It introduces a smoothing-first, then-estimate framework that uses a data-driven local regularity estimator $\widehat{H}_t$ to adapt bandwidths for trajectory smoothing, enabling adaptive mean $\widehat{\mu}_N^*$ and covariance $\widehat{\Gamma}_N^*$ estimators. The key contributions include a concentration bound for $\widehat{H}_t$, penalized risk-based bandwidth selection that accounts for curve dropouts, and a unified approach applicable to independent or common designs in both sparse and dense regimes; empirical results on MfBm-inspired simulations show competitive or superior performance relative to established methods. The methodology provides robust, easy-to-implement tools for FDA that leverage replication to automatically adapt to unknown trajectory regularity and heteroscedastic measurement errors. The work lays groundwork for extensions to smoother paths and potential CLTs, with practical impact for real-world irregular functional data analysis.
Abstract
Nonparametric estimators for the mean and the covariance functions of functional data are proposed. The setup covers a wide range of practical situations. The random trajectories are, not necessarily differentiable, have unknown regularity, and are measured with error at discrete design points. The measurement error could be heteroscedastic. The design points could be either randomly drawn or common for all curves. The estimators depend on the local regularity of the stochastic process generating the functional data. We consider a simple estimator of this local regularity which exploits the replication and regularization features of functional data. Next, we use the ``smoothing first, then estimate'' approach for the mean and the covariance functions. They can be applied with both sparsely or densely sampled curves, are easy to calculate and to update, and perform well in simulations. Simulations built upon an example of real data set, illustrate the effectiveness of the new approach.
