Table of Contents
Fetching ...

Local depth-based classification of directional data

Giuseppe Gismondi, Rebecca Rivieccio, Giuseppe Pandolfo

TL;DR

This work aims at proposing the use of a local notion of data depth function to be applied in the DD-plot (Depth vs. Depth plot) to classify directional data.

Abstract

Directional data arise in many applications where observations are naturally represented as unit vectors or as observations on the surface of a unit hypersphere. In this context, statistical depth functions provide a center--outward ordering of the data. This work aims at proposing the use of a local notion of data depth function to be applied in the DD-plot (Depth vs. Depth plot) to classify directional data. The proposed method is investigated through an extensive simulation study and two real-data examples.

Local depth-based classification of directional data

TL;DR

This work aims at proposing the use of a local notion of data depth function to be applied in the DD-plot (Depth vs. Depth plot) to classify directional data.

Abstract

Directional data arise in many applications where observations are naturally represented as unit vectors or as observations on the surface of a unit hypersphere. In this context, statistical depth functions provide a center--outward ordering of the data. This work aims at proposing the use of a local notion of data depth function to be applied in the DD-plot (Depth vs. Depth plot) to classify directional data. The proposed method is investigated through an extensive simulation study and two real-data examples.
Paper Structure (15 sections, 8 theorems, 98 equations, 5 figures)

This paper contains 15 sections, 8 theorems, 98 equations, 5 figures.

Key Result

Proposition 1

Given a point $x_i\in X\subseteq S^{q-1}$ and the reflected region $X^{R_i}$, we have: where $d_{\cos}(x,y) = 1 - \langle x, y \rangle$. Hence, $x_i$ is either a depth median or antipodal to the depth median of the region.

Figures (5)

  • Figure 1: Density contour plot of a trimodal distribution on the sphere (a), along with the corresponding contour plots CDD (b) and LCDD with $\beta=0.25$ (c).
  • Figure 2: Simulation results for Scenario 1. The rows of the table indicate the specific Setup, the columns the number of dimensions, and the three different colors indicate noise levels: Low (L)--green, Medium (M)--blue, High (H)--red.
  • Figure 3: Simulation results for Scenario 2. The rows of the table indicate the specific Setup, the columns the number of dimensions, and the three different colors indicate noise levels: Low (L)--green, Medium (M)--blue, High (H)--red.
  • Figure 4: Repeated 10-fold cross-validation results for the Wholesales dataset. On the x--axis there are the $\beta$s, on the y--axis the cross--validated MR, and the dotted red line highlights the value of $\beta$ achieving the minimum MR.
  • Figure 5: Repeated 10-fold cross-validation results for the Spam dataset. On the x--axis there are the $\beta$s, on the y--axis the cross--validated MR, and the dotted red line highlights the value of $\beta$ achieving the minimum MR.

Theorems & Definitions (21)

  • Definition 1: Cosine Distance Depth
  • Proposition 1
  • proof
  • Definition 2
  • Proposition 2
  • proof
  • Definition 3
  • Theorem 3: Sample behaviour in $\beta$-neighborhoods
  • proof
  • Definition 4: Population Local Cosine Distance Depth
  • ...and 11 more