Two-sample tests for relevant differences in persistence diagrams

Johannes Krebs; Daniel Rademacher

Two-sample tests for relevant differences in persistence diagrams

Johannes Krebs, Daniel Rademacher

Abstract

We study two-sample tests for relevant differences in persistence diagrams obtained from $L^p$-$m$-approximable data $(\mathcal{X}_t)_t$ and $(\mathcal{Y}_t)_t$. To this end, we compare variance estimates w.r.t.\ the Wasserstein metrics on the space of persistence diagrams. In detail, we consider two test procedures. The first compares the Fr{é}chet variances of the two samples based on estimators for the Fr{é}chet mean of the observed persistence diagrams $PD(\mathcal{X}_i)$ ($1\le i\le m$), resp., $PD(\mathcal{Y}_j)$ ($1\le j\le n$) of a given feature dimension. We use classical functional central limit theorems to establish consistency of the testing procedure. The second procedure relies on a comparison of the so-called independent copy variances of the respective samples. Technically, this leads to functional central limit theorems for U-statistics built on $L^p$-$m$-approximable sample data.

Two-sample tests for relevant differences in persistence diagrams

Abstract

We study two-sample tests for relevant differences in persistence diagrams obtained from

-approximable data

and

. To this end, we compare variance estimates w.r.t.\ the Wasserstein metrics on the space of persistence diagrams. In detail, we consider two test procedures. The first compares the Fr{é}chet variances of the two samples based on estimators for the Fr{é}chet mean of the observed persistence diagrams

(

), resp.,

(

) of a given feature dimension. We use classical functional central limit theorems to establish consistency of the testing procedure. The second procedure relies on a comparison of the so-called independent copy variances of the respective samples. Technically, this leads to functional central limit theorems for U-statistics built on

-approximable sample data.

Paper Structure (7 sections, 13 theorems, 119 equations)

This paper contains 7 sections, 13 theorems, 119 equations.

Tests for persistence diagrams
A test statistic for relevant differences of Fréchet variances
A test statistic for relevant differences of inco-variances
Mathematical background: a two-parameter FCLT for U-statistics
Mathematical details
Two sample test for Fréchet variances
Two sample tests for inco-variances

Key Result

Theorem 1.2

Let the regularity conditions of Assumption A:FrechetVariances (r) be satisfied for some $r>4$ and suppose there is a constant $\tau \in (0,1)$ such that $\lim_{m,n \to \infty}m/(m+n) = \tau$. Moreover, let $X^*_t = W_r^2(\operatorname{PD}(\mathcal{X}_t), \widehat{\mu}_\mathcal{X})$, resp., $Y^*_t = where $B$ is a standard Brownian motion, $\xi = 2\sqrt{\frac{\Gamma_\mathcal{X}}{\tau} + \frac{\Gam

Theorems & Definitions (20)

Theorem 1.2
Theorem 1.3
proof : Proof of Theorem \ref{['T:FrechetVarTest']}
Theorem 1.4
Theorem 1.5
Remark 2.2
Theorem 2.3: FCLT for $U_n(h)$
proof : Proof of Theorem \ref{['Thrm:FCLTforUStatistic']}
Theorem 2.4: Two-parameter Donsker type FCLT
Theorem 2.5: Moment condition for $U_n(h_2)$
...and 10 more

Two-sample tests for relevant differences in persistence diagrams

Abstract

Two-sample tests for relevant differences in persistence diagrams

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (20)