Table of Contents
Fetching ...

Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces

Xueqin Wang, Jin Zhu, Wenliang Pan, Junhao Zhu, Heping Zhang

Abstract

Distribution function is essential in statistical inference, and connected with samples to form a directed closed loop by the correspondence theorem in measure theory and the Glivenko-Cantelli and Donsker properties. This connection creates a paradigm for statistical inference. However, existing distribution functions are defined in Euclidean spaces and no longer convenient to use in rapidly evolving data objects of complex nature. It is imperative to develop the concept of distribution function in a more general space to meet emerging needs. Note that the linearity allows us to use hypercubes to define the distribution function in a Euclidean space, but without the linearity in a metric space, we must work with the metric to investigate the probability measure. We introduce a class of metric distribution functions through the metric between random objects and a fixed location in metric spaces. We overcome this challenging step by proving the correspondence theorem and the Glivenko-Cantelli theorem for metric distribution functions in metric spaces that lie the foundation for conducting rational statistical inference for metric space-valued data. Then, we develop homogeneity test and mutual independence test for non-Euclidean random objects, and present comprehensive empirical evidence to support the performance of our proposed methods.

Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces

Abstract

Distribution function is essential in statistical inference, and connected with samples to form a directed closed loop by the correspondence theorem in measure theory and the Glivenko-Cantelli and Donsker properties. This connection creates a paradigm for statistical inference. However, existing distribution functions are defined in Euclidean spaces and no longer convenient to use in rapidly evolving data objects of complex nature. It is imperative to develop the concept of distribution function in a more general space to meet emerging needs. Note that the linearity allows us to use hypercubes to define the distribution function in a Euclidean space, but without the linearity in a metric space, we must work with the metric to investigate the probability measure. We introduce a class of metric distribution functions through the metric between random objects and a fixed location in metric spaces. We overcome this challenging step by proving the correspondence theorem and the Glivenko-Cantelli theorem for metric distribution functions in metric spaces that lie the foundation for conducting rational statistical inference for metric space-valued data. Then, we develop homogeneity test and mutual independence test for non-Euclidean random objects, and present comprehensive empirical evidence to support the performance of our proposed methods.

Paper Structure

This paper contains 17 sections, 10 theorems, 27 equations, 6 figures, 2 tables.

Key Result

Theorem 1

Denote $S=\{(u, v)\in\mathcal{M}\times\mathcal{M}:F^M_{\mu}(u,v)=F^M_{\nu}(u,v)\}$ for two given Borel probability measures, $\mu$ and $\nu,$ with their respective supports, $supp\{\mu\}$ and $supp\{\nu\},$ on $(\mathcal{M}, d)$. Suppose that $(\mathcal{M},d)$ is a Polish space and the metric $d$ is

Figures (6)

  • Figure 1: Conceptual diagram of statistical inference paradigm. The solid arrows indicate a conceptual deduction, and the dashed ones are a statistical approximation.
  • Figure 2: (a) Visualization of the direction in metric space by a 2-d Euclidean space example. (b) Visualization of the directionally $(\epsilon, \eta, L)$-limited condition in the 2-d Euclidean space. For a given $\eta>0$, consider a circle $\mathcal{N}$ with the radius $r$ such that for any two points $c_i$ and $c_j$ in $\mathcal{N}$ we have $d(c_i, c_j)/r\geq \eta$. The directionally $(\epsilon, \eta, L)$-limited condition means that there exists an $L$ such that the cardinality of $\{c_1,\ldots,c_8,\ldots\}$ is always less than $L$.
  • Figure 3: Rejection rate of the proposed homogeneity and mutual independence tests. The line type distinguishes permutation-based and spectrum-based tests; the color distinguishes the $H_0$ and $H_1$. The black dashed line is the significance level.
  • Figure 4: Rejection rate of hypothesis tests when $\kappa$ increases. A: the homogeneity test. B: the mutual independence test. Methods are distinguished by point and line. The black dashed line is the nominal significance level.
  • Figure 5: (a) Visualization for medial core, radial distance, and the order of 15,000 landmarks on each hippocampus surface. (b) One subject's functional curve of radial distance on the left and right hippocampus surfaces. The $x$-axis corresponds to 15,000 landmarks that whirlingly surround the hippocampus.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Definition 1
  • Definition 2: federer2014geometric
  • Remark 1
  • Theorem 1: The fundamental correspondence theorem of MDF
  • Corollary 1
  • Theorem 2: The fundamental correspondence theorem of joint MDF
  • Corollary 2
  • Theorem 3: The Glivenko-Cantelli type property of EMDF
  • Corollary 3: A concentration inequality of EMDF
  • Theorem 4: The Convergence of Metric Distribution Process
  • ...and 4 more