Table of Contents
Fetching ...

Metric Oja Depth, New Statistical Tool for Estimating the Most Central Objects

Vida Zamanifarizhandi, Joni Virta

Abstract

The Oja depth (simplicial volume depth) is one of the classical statistical techniques for measuring the central tendency of data in multivariate space. Despite the widespread emergence of object data like images, texts, matrices or graphs, a well-developed and suitable version of Oja depth for object data is lacking. To address this shortcoming, a novel measure of statistical depth, the metric Oja depth applicable to any object data, is proposed. Two competing strategies are used for optimizing metric depth functions, i.e., finding the deepest objects with respect to them. The performance of the metric Oja depth is compared with three other depth functions (half-space, lens, and spatial) in diverse data scenarios. Keywords: Object Data, Metric Oja depth, Statistical depth, Optimization, Metric statistics

Metric Oja Depth, New Statistical Tool for Estimating the Most Central Objects

Abstract

The Oja depth (simplicial volume depth) is one of the classical statistical techniques for measuring the central tendency of data in multivariate space. Despite the widespread emergence of object data like images, texts, matrices or graphs, a well-developed and suitable version of Oja depth for object data is lacking. To address this shortcoming, a novel measure of statistical depth, the metric Oja depth applicable to any object data, is proposed. Two competing strategies are used for optimizing metric depth functions, i.e., finding the deepest objects with respect to them. The performance of the metric Oja depth is compared with three other depth functions (half-space, lens, and spatial) in diverse data scenarios. Keywords: Object Data, Metric Oja depth, Statistical depth, Optimization, Metric statistics

Paper Structure

This paper contains 16 sections, 7 theorems, 14 equations, 7 figures, 1 table.

Key Result

Theorem 1

Assume that, for some $a \in \mathcal{X}$, we have $\mathbb E \{ d(X, a) \} < \infty$. Then $D_{O3}(x)$ exists as well-defined.

Figures (7)

  • Figure 1: The average estimation errors for each of the five methods in the correlation matrix simulation. The scale of the $y$-axis is logarithmic.
  • Figure 2: The running times (in seconds) of each method in the correlation matrix simulation. The scale of the $y$-axis is logarithmic.
  • Figure 3: The average estimation errors for each of the five methods in the hypersphere simulation. The scale of the $y$-axis is logarithmic.
  • Figure 4: The effect of increasing the dimension of the unit sphere to the in-sample estimation error, estimated using 100 replications. The lines of MSD, MLD, MOD2 and MOD3 are almost perfectly overlapping.
  • Figure 5: Comparing the performance of different optimization methods on MOD3 and MHD. DLP refers to the out-of sample method. Left: estimation error of 7 different cases. L-FBSG-B yielded exactly the same result on MOD3 as NMKB (light red line). Right: Estimation Time. The scale of the y-axis is logarithmic.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Corollary 6
  • Theorem 7