Projection depth for functional data: Practical issues, computation and applications

Filip Bočinec; Stanislav Nagy; Hyemin Yeon

Projection depth for functional data: Practical issues, computation and applications

Filip Bočinec, Stanislav Nagy, Hyemin Yeon

TL;DR

This work investigates practical aspects of the recently proposed regularized projection depth (RPD), which induces a meaningful ordering of functional data while appropriately accommodating their complex shape features, and proposes a random projection-based approach for its efficient computation.

Abstract

Statistical analysis of functional data is challenging due to their complex patterns, for which functional depth provides an effective means of reflecting their ordering structure. In this work, we investigate practical aspects of the recently proposed regularized projection depth (RPD), which induces a meaningful ordering of functional data while appropriately accommodating their complex shape features. Specifically, we examine the impact and choice of its tuning parameter, which regulates the degree of effective dimension reduction applied to the data, and propose a random projection-based approach for its efficient computation, supported by theoretical justification. Through comprehensive numerical studies, we explore a wide range of statistical applications of the RPD and demonstrate its particular usefulness in uncovering shape features in functional data analysis. This ability allows the RPD to outperform competing depth-based methods, especially in tasks such as functional outlier detection, classification, and two-sample hypothesis testing.

Projection depth for functional data: Practical issues, computation and applications

TL;DR

Abstract

Paper Structure (12 sections, 5 theorems, 47 equations, 10 figures, 7 tables)

This paper contains 12 sections, 5 theorems, 47 equations, 10 figures, 7 tables.

Introduction: Regularized projection depth
Selection of tuning parameter
Consistency with beta chosen as a quantile
Robustness of the induced median with beta chosen as a quantile
Computation of RPD
Simulations
Outlier detection
Classification
Hypothesis testing
Robust location estimation
Proofs of theoretical results
Simulation study: Supplementary results

Key Result

Lemma 1

Let $X \sim P_X \in \mathcal{P}({\mathbb{H}})$ and consider $u\in(0,1)$. Suppose that $V\sim\nu\in\mathcal{P}({\mathbb{S}})$ is smooth in the sense that $\nu(H) = 0$ for every hyperplane $H \subset \mathbb{H}$ passing through the origin. If $P_X$ has no atom with probability at least $1/2$, i.e. then $\beta(u)>0$.

Figures (10)

Figure 1: Convergence rate plots of $D_{(u)}^{(M)}$ with $u = 0.001$ (left), $u = 0.1$ (middle), and $u = 0.5$ (right). The depth is evaluated for a central function $x(t)=0$, $t \in [0,1]$ (red), and for a peripheral function $x(t)=1.5\,\sin(2\pi t)$, $t \in [0,1]$ (blue). The reference dataset consists of $n=50$ independent centered Gaussian trajectories with covariance function $\Sigma(s,t)=\exp\left(-(s-t)^2/0.32\right)$, discretized on a grid of $T=51$ points. The plots show $D_{(u)}^{(M)}(x;\widehat{P}_n)$ as a function of $M$, averaged over $20$ Monte Carlo simulations, with dashed horizontal lines corresponding to $M=10^7$. The stochastic approximation of RPD converges substantially more slowly for central functions than for peripheral ones.
Figure 2: A single random sample of functional observations (gray) generated from Model \ref{['ModelD5']} with sample size $n = 100$, containing $m = 10$ outlying curves (red) (top left). The same sample is shown with the sample median curve highlighted in red, the $10\%$ deepest observations in orange, and the $10\%$ least deep observations in green, based on RPD with $u = 0.001$ (top right), MBD (bottom left), and SD (bottom right).
Figure 3: Supervised classification: A simulated dataset generated under Models \ref{['ModelC1']}--\ref{['ModelC3']} described in Section \ref{['sec: Classification']}. The functions from class $X$ (the first class) are shown in gray, while the functions from class $Y$ (the second class) are in red.
Figure 4: Supervised classification: DD-plots of the training data in a single run of Model \ref{['ModelC3']} (gray points for $X_i$, red points for $Y_i$) with (a) RPD (top left, $u = 0.001$), (b) RHD (top right, $u = 0.001$), (c) FD (middle left), (d) ID (middle right), (e) MBD (bottom left), and (f) SD (bottom right). In the plots, the dashed line represents the max-depth classifier and the orange line the linear DD-classifier best separating the two groups. The results are striking---while RPD with DD-classifier achieves almost perfect separation of the clouds, FD, MBD, and SD are unable to cope with the shape difference between the clusters. RHD and ID perform slightly better, but the overlap of the two DD-clouds is substantial.
Figure 5: Hypothesis testing: Empirical power of the KW tests using depth-based rankings. The results of the KW tests induced by depths other than RPD, FD, and SD are not reported, as they do not achieve correct nominal levels; see Table \ref{['tb_2test_sizes']}. Since the RPD-based tests with $u \in \{0.1, 0.001\}$ show similar results, only the results from the quantile level $u=0.001$ are presented.
...and 5 more figures

Theorems & Definitions (12)

Definition : Regularized projection depth, [B]
Definition : Regularization based on quantiles
Lemma 1
Lemma 2
Theorem 3
Theorem 4
Theorem 5: Consistency of the random RPD
proof : Proof of Lemma \ref{['lemma: betaPos']}
proof : Proof of Lemma \ref{['lemma: betaConv']}
proof : Proof of Theorem \ref{['thm: ConsistRegQuantile']}
...and 2 more

Projection depth for functional data: Practical issues, computation and applications

TL;DR

Abstract

Projection depth for functional data: Practical issues, computation and applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (12)