Table of Contents
Fetching ...

Robust functional PCA for relative data

Jeremy Oguamalam, Peter Filzmoser, Karel Hron, Alessandra Menafoglio, Una Radojičić

TL;DR

This work addresses the challenge of robustly extracting principal modes of variation from relative functional data, such as density curves, within the Bayes space framework. It extends the Mahalanobis distance to Bayes spaces through regularized standardization, defining the regularized density Mahalanobis distance (RDMD) and establishing its connection to existing functional distances. Building on this, the authors develop Robust Density PCA (RDPCA), featuring trimmed Bayes covariance estimation and an iterative HD-based subset selection algorithm to obtain robust covariance operators and principal components for densities. The method is validated via simulations with contamination and two real-data applications (EPXMA spectra and fertility densities), demonstrating improved covariance estimation, more accurate PCs, and effective outlier detection, with discussion of extensions to sparse data and multivariate densities.

Abstract

This paper introduces a robust approach to functional principal component analysis (FPCA) for relative data, particularly density functions. While recent papers have studied density data within the Bayes space framework, there has been limited focus on developing robust methods to effectively handle anomalous observations and large noise. To address this, we extend the Mahalanobis distance concept to Bayes spaces, proposing its regularized version that accounts for the constraints inherent in density data. Based on this extension, we introduce a new method, robust density principal component analysis (RDPCA), for more accurate estimation of functional principal components in the presence of outliers. The method's performance is validated through simulations and real-world applications, showing its ability to improve covariance estimation and principal component analysis compared to traditional methods.

Robust functional PCA for relative data

TL;DR

This work addresses the challenge of robustly extracting principal modes of variation from relative functional data, such as density curves, within the Bayes space framework. It extends the Mahalanobis distance to Bayes spaces through regularized standardization, defining the regularized density Mahalanobis distance (RDMD) and establishing its connection to existing functional distances. Building on this, the authors develop Robust Density PCA (RDPCA), featuring trimmed Bayes covariance estimation and an iterative HD-based subset selection algorithm to obtain robust covariance operators and principal components for densities. The method is validated via simulations with contamination and two real-data applications (EPXMA spectra and fertility densities), demonstrating improved covariance estimation, more accurate PCs, and effective outlier detection, with discussion of extensions to sparse data and multivariate densities.

Abstract

This paper introduces a robust approach to functional principal component analysis (FPCA) for relative data, particularly density functions. While recent papers have studied density data within the Bayes space framework, there has been limited focus on developing robust methods to effectively handle anomalous observations and large noise. To address this, we extend the Mahalanobis distance concept to Bayes spaces, proposing its regularized version that accounts for the constraints inherent in density data. Based on this extension, we introduce a new method, robust density principal component analysis (RDPCA), for more accurate estimation of functional principal components in the presence of outliers. The method's performance is validated through simulations and real-world applications, showing its ability to improve covariance estimation and principal component analysis compared to traditional methods.

Paper Structure

This paper contains 20 sections, 7 theorems, 32 equations, 11 figures, 1 algorithm.

Key Result

Lemma 2.1

Let $X\in\mathcal{B}^2(I)$ and $\mathrm{clr}(X)\in L^2(I)$ be its clr transformation. Then the following identities hold:

Figures (11)

  • Figure 1: Visualization of density functions (left) and clr transformed counterparts (right). Solid curves represent the main processes, while the dashed ones indicate the outliers.
  • Figure 2: Mean ($\pm$ standard error) ISE between the estimated and true covariances (left) and cosine similarity between the first five pairs of estimated and true eigenfunctions (right), obtained using RDPCA (solid) and SFPCA (dashed).
  • Figure 3: True (left), RDPCA with $\alpha=0.32$ (middle), and SFPCA (right) correlation function, for the example shown in Figure \ref{['fig:dataModel2']}.
  • Figure 4: Distance-distance plot of squared robust vs. non-robust RDMD (left) and robust $\alpha$-Mahalanobis berrendero2020 (right) distances for example of Figure \ref{['fig:dataModel2']}. The regularization parameter is $\alpha = 0.32$. Dashed lines indicate the corresponding cutoff values under Gaussianity. Circles and triangles correspond to true regular and outlying observations, respectively.
  • Figure 5: Mean ($\pm$ standard error) ISE between the estimated and true covariances (left) and cosine similarity between the first five pairs of estimated and true eigenfunctions (right), obtained using RDPCA (solid) and SFPCA (dashed). The notation “normal" and “t" refer to the distribution of the scores in the underlying model as given at the beginning of the section.
  • ...and 6 more figures

Theorems & Definitions (12)

  • Lemma 2.1
  • Remark 2.1
  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Definition 3.1: Regularized Mahalanobis distance between two densities
  • Definition 3.2: Regularized Mahalanobis distance for density data
  • Proposition 3.4
  • Corollary 4.1
  • Remark 4.1
  • ...and 2 more