Table of Contents
Fetching ...

Robust Functional Principal Component Analysis for Non-Euclidean Random Objects

Jiazhen Xu, Andrew T. A. Wood, Tao Zou

TL;DR

This work addresses robust analysis of time-varying non-Euclidean objects by transforming object-valued curves into Fréchet median distance trajectories and applying a Winsorized $U$-statistic-based FPCA. The key methodological advances include a robust autocovariance operator $C_{WPU}$ whose eigenfunctions align with the standard case, and a data-driven mechanism to maintain robustness against outliers via a cutoff $Q$ and radius function $\xi$. Theoretical guarantees cover uniform convergence of the Fréchet median, asymptotic Gaussian behavior of the estimator, and robustness properties with explicit breakdown points. Empirical evidence from a NYC Citi Bike case study and simulations shows improved robustness to outliers while preserving competitive performance when data are clean, highlighting the method's practical value for dynamic networks and other non-Euclidean time-varying objects.

Abstract

Functional data analysis offers a diverse toolkit of statistical methods tailored for analyzing samples of real-valued random functions. Recently, samples of time-varying random objects, such as time-varying networks, have been increasingly encountered in modern data analysis. These data structures represent elements within general metric spaces that lack local or global linear structures, rendering traditional functional data analysis methods inapplicable. Moreover, the existing methodology for time-varying random objects does not work well in the presence of outlying objects. In this paper, we propose a robust method for analysing time-varying random objects. Our method employs pointwise Fréchet medians and then constructs pointwise distance trajectories between the individual time courses and the sample Fréchet medians. This representation effectively transforms time-varying objects into functional data. A novel robust approach to functional principal component analysis based on a Winsorized U-statistic estimator of the covariance structure is introduced. The proposed robust analysis of these distance trajectories is able to identify key features of time-varying objects and is useful for downstream analysis. To illustrate the efficacy of our approach, numerical studies focusing on dynamic networks are conducted. The results indicate that the proposed method exhibits good all-round performance and surpasses the existing approach in terms of robustness, showcasing its superior performance in handling time-varying objects data.

Robust Functional Principal Component Analysis for Non-Euclidean Random Objects

TL;DR

This work addresses robust analysis of time-varying non-Euclidean objects by transforming object-valued curves into Fréchet median distance trajectories and applying a Winsorized -statistic-based FPCA. The key methodological advances include a robust autocovariance operator whose eigenfunctions align with the standard case, and a data-driven mechanism to maintain robustness against outliers via a cutoff and radius function . Theoretical guarantees cover uniform convergence of the Fréchet median, asymptotic Gaussian behavior of the estimator, and robustness properties with explicit breakdown points. Empirical evidence from a NYC Citi Bike case study and simulations shows improved robustness to outliers while preserving competitive performance when data are clean, highlighting the method's practical value for dynamic networks and other non-Euclidean time-varying objects.

Abstract

Functional data analysis offers a diverse toolkit of statistical methods tailored for analyzing samples of real-valued random functions. Recently, samples of time-varying random objects, such as time-varying networks, have been increasingly encountered in modern data analysis. These data structures represent elements within general metric spaces that lack local or global linear structures, rendering traditional functional data analysis methods inapplicable. Moreover, the existing methodology for time-varying random objects does not work well in the presence of outlying objects. In this paper, we propose a robust method for analysing time-varying random objects. Our method employs pointwise Fréchet medians and then constructs pointwise distance trajectories between the individual time courses and the sample Fréchet medians. This representation effectively transforms time-varying objects into functional data. A novel robust approach to functional principal component analysis based on a Winsorized U-statistic estimator of the covariance structure is introduced. The proposed robust analysis of these distance trajectories is able to identify key features of time-varying objects and is useful for downstream analysis. To illustrate the efficacy of our approach, numerical studies focusing on dynamic networks are conducted. The results indicate that the proposed method exhibits good all-round performance and surpasses the existing approach in terms of robustness, showcasing its superior performance in handling time-varying objects data.
Paper Structure (11 sections, 9 theorems, 29 equations, 5 figures)

This paper contains 11 sections, 9 theorems, 29 equations, 5 figures.

Key Result

proposition 1

Under Assumption condition c1.5, for any fixed $t$, if $\bar{M}_n(\omega, t) - \bar{M}_n(\mu_{\rm GM}(t), t)>0$ for all $\omega\in B_\delta(\mu_{\rm GM}(t))$, where $B_\delta(\mu_{\rm GM}(t))=\{\omega\in\Omega:d(\omega,\mu_{\rm GM}(t))<\delta\}$, then $d(\hat{\mu}_{\rm GM}(t),\mu_{\rm GM}(t))<\delt

Figures (5)

  • Figure 1: Sample mean function (left plot) and eigenfunctions for the robust functional component analysis (right plot) of the distance trajectories at 20-minute intervals of graph Laplacians of daily Citi Bike trip networks in New York City. In the right plot, the red line represents the first eigenfunction, which accounts for 59.25% of variability in the trajectories and the blue line represents the second eigenfunction, explaining 13.54% of the variability.
  • Figure 2: Pairwise plots of the first two functional principal component scores, distinguished by day of the week (left plot), by season (middle plot), and by year (right plot).
  • Figure 3: Estimated eigenfunctions of the proposed autocovariance and the autocovariance in Dubey and Müller's method, abbreviated as DM, with outliers included versus outliers removed. The solid black lines represent the estimated eigenfunctions with data including outliers, and the red dashed lines represent those with data excluding outliers.
  • Figure 4: Sample mean function (left plot) and eigenfunctions (middle plot) and the scatter plot of first two functional principal component (right plot) obtained from the robust functional component analysis of the distance trajectories for networks.
  • Figure 5: Simulated data with outliers generated by extreme values with a shift in time varying connectivity weights. Mean absolute angle (left), empirical bias (middle) and Mean Squared Error (right) of the leading eigenfunction. DM method stands for Dubey and Müller's method.

Theorems & Definitions (12)

  • proposition 1
  • theorem 1
  • theorem 2
  • theorem 3
  • remark 1
  • remark 2
  • theorem 4
  • corollary 1
  • remark 3
  • theorem 5
  • ...and 2 more