Table of Contents
Fetching ...

New energy distances for statistical inference on infinite dimensional Hilbert spaces without moment conditions

Holger Dette, Jiajun Tang

Abstract

For statistical inference on an infinite-dimensional Hilbert space $\H $ with no moment conditions we introduce a new class of energy distances on the space of probability measures on $\H$. The proposed distances consist of the integrated squared modulus of the corresponding difference of the characteristic functionals with respect to a reference probability measure on the Hilbert space. Necessary and sufficient conditions are established for the reference probability measure to be {\em characteristic}, the property that guarantees that the distance defines a metric on the space of probability measures on $\H$. We also use these results to define new distance covariances, which can be used to measure the dependence between the marginals of a two dimensional distribution of $\H^2$ without existing moments. On the basis of the new distances we develop statistical inference for Hilbert space valued data, which does not require any moment assumptions. As a consequence, our methods are robust with respect to heavy tails in finite dimensional data. In particular, we consider the problem of comparing the distributions of two samples and the problem of testing for independence and construct new minimax optimal tests for the corresponding hypotheses. We also develop aggregated (with respect to the reference measure) procedures for power enhancement and investigate the finite-sample properties by means of a simulation study.

New energy distances for statistical inference on infinite dimensional Hilbert spaces without moment conditions

Abstract

For statistical inference on an infinite-dimensional Hilbert space with no moment conditions we introduce a new class of energy distances on the space of probability measures on . The proposed distances consist of the integrated squared modulus of the corresponding difference of the characteristic functionals with respect to a reference probability measure on the Hilbert space. Necessary and sufficient conditions are established for the reference probability measure to be {\em characteristic}, the property that guarantees that the distance defines a metric on the space of probability measures on . We also use these results to define new distance covariances, which can be used to measure the dependence between the marginals of a two dimensional distribution of without existing moments. On the basis of the new distances we develop statistical inference for Hilbert space valued data, which does not require any moment assumptions. As a consequence, our methods are robust with respect to heavy tails in finite dimensional data. In particular, we consider the problem of comparing the distributions of two samples and the problem of testing for independence and construct new minimax optimal tests for the corresponding hypotheses. We also develop aggregated (with respect to the reference measure) procedures for power enhancement and investigate the finite-sample properties by means of a simulation study.
Paper Structure (39 sections, 28 theorems, 359 equations, 3 figures, 1 algorithm)

This paper contains 39 sections, 28 theorems, 359 equations, 3 figures, 1 algorithm.

Key Result

Theorem 2.1

Let $\mathcal{H}$ be a separable Hilbert space and $\mathbb P_1,\mathbb P_2\in\mathcal{P}(\mathcal{H})$ with characteristic functionals $\varphi_1$ and $\varphi_2$, respectively. Then, $\mathbb P_1=\mathbb P_2$ if and only if $\varphi_1=\varphi_2$.

Figures (3)

  • Figure 1: Sample curves of $X$ and $Y$ in Settings (1) (first and second panel) and (2) (third and fourth panel), respectively. The sample curves in Setting (2) are heavy-tailed.
  • Figure 2: Empirical rejection probabilities (y-axis) under Settings (1) (first row) and (2) (second row), for various values of $\theta$ (x-axis); left to right: $(n,N)=(40,40), (40,200),(200,40),(200,200)$. Horizontal dotted line: nominal level $\alpha=0.05$.
  • Figure 3: Empirical rejection probabilities (y-axis) under Settings (3) (first row) and (4) (second row), for two-sample inference, for various values of $\theta$ (x-axis); left to right: $(n,N)=(40,40), (40,200),(200,40),(200,200)$. Horizontal dotted line: nominal level $\alpha=0.05$.

Theorems & Definitions (36)

  • Theorem 2.1
  • Definition 2.2
  • Proposition 2.3
  • Theorem 2.4
  • Corollary 2.5
  • Corollary 2.6
  • Theorem 2.7
  • Proposition 2.8
  • Remark 2.9
  • Example 2.10
  • ...and 26 more