Table of Contents
Fetching ...

Quantiles, Ranks and Signs in Metric Spaces

Hang Liu, Xueqin Wang, Jin Zhu, Heping Zhang

Abstract

Non-Euclidean data become more prevalent in practice, necessitating the development of a framework for statistical inference analogous to that for Euclidean data. Quantile is one of the most important concepts in traditional statistical inference; we introduce the counterpart, both locally and globally, for data objects in metric spaces. This is realized by expanding upon the metric distribution function proposed by Wang et al. (2021). Rank and sign are defined at local and global levels as a natural consequence of the center-outward ordering of metric spaces brought about by the local and global quantiles. The theoretical properties are established, such as the root-$n$ consistency and uniform consistency of the local and global empirical quantiles and the distribution-freeness of ranks and signs. The empirical metric median, which is defined here as the 0th empirical global metric quantile, is proven to be resistant to contamination by means of both theoretical and numerical approaches. Quantiles have been shown to be valuable through extensive simulations in a number of metric spaces. Moreover, we introduce a family of fast rank-based independence tests for a generic metric space. Monte Carlo experiments show good finite-sample performance of the test.

Quantiles, Ranks and Signs in Metric Spaces

Abstract

Non-Euclidean data become more prevalent in practice, necessitating the development of a framework for statistical inference analogous to that for Euclidean data. Quantile is one of the most important concepts in traditional statistical inference; we introduce the counterpart, both locally and globally, for data objects in metric spaces. This is realized by expanding upon the metric distribution function proposed by Wang et al. (2021). Rank and sign are defined at local and global levels as a natural consequence of the center-outward ordering of metric spaces brought about by the local and global quantiles. The theoretical properties are established, such as the root- consistency and uniform consistency of the local and global empirical quantiles and the distribution-freeness of ranks and signs. The empirical metric median, which is defined here as the 0th empirical global metric quantile, is proven to be resistant to contamination by means of both theoretical and numerical approaches. Quantiles have been shown to be valuable through extensive simulations in a number of metric spaces. Moreover, we introduce a family of fast rank-based independence tests for a generic metric space. Monte Carlo experiments show good finite-sample performance of the test.
Paper Structure (24 sections, 14 theorems, 22 equations, 4 figures, 1 table)

This paper contains 24 sections, 14 theorems, 22 equations, 4 figures, 1 table.

Key Result

Proposition 1

If Assumption ass.density holds, then for $0 \leq \tau_1 < \tau_2 \leq 1$ and any ${\bf u} \in {\cal M}$, $\bar{q}^{\cal M} ({\bf u}, \tau_1) \subsetneq \bar{q}^{\cal M} ({\bf u}, \tau_2)$.

Figures (4)

  • Figure 1: Plots of samples generated from the standard spherical Gaussian distribution in $\mathbb{R}^2$ (left panel) and the von Mises-Fisher distribution in $\mathcal{S}^2$ (right panel), with estimates of $J_{\mu}^{\cal M}({\bf u})$ marked in red at some points.
  • Figure 2: Left panel: the tangent von Mises-Fisher distribution in the unit sphere $\mathcal{S}^2$. Right panel: space of SPD matrices where each component of the matrices is generated from a log-normal distribution (each point in the plot represents the lower diagonal values of an SPD matrix). The color reflects the level of the global metric quantile.
  • Figure 3: The rejection rates of the Spearman metric rank-based, BCov and DCov tests for the spherical normal distribution (left panel) and Cauchy distribution (right panel) of $\hbox{\boldmath$\epsilon$}_i$ for different values of $k$. The red line represents the nominal level.
  • Figure 4: The rejection rates of the Spearman metric rank-based, BCov and DCov tests for the spherical normal distribution (left panel) and Cauchy distribution (right panel) of $\hbox{\boldmath$\epsilon$}_i$ for different values of the sample size $n$. The red line represents the nominal level.

Theorems & Definitions (22)

  • Definition 1
  • Proposition 1
  • Definition 2
  • Remark 1
  • Proposition 2
  • Proposition 3
  • Definition 3
  • Proposition 4
  • Proposition 5
  • Definition 4
  • ...and 12 more