Table of Contents
Fetching ...

Adversarially Robust Topological Inference

Siddharth Vishwanath, Bharath K. Sriperumbudur, Kenji Fukumizu, Satoshi Kuriki

TL;DR

This work tackles the vulnerability of persistent homology to outliers by proposing a robust, scalable framework based on a median-of-means distance (MoM Dist). It introduces MoM-dist-based weighted filtrations, proves consistency and near-minimax rates for sublevel persistence diagrams under adversarial contamination, and establishes stability for the corresponding filtrations. An adaptive Lepski-based procedure selects the tuning parameter Q without sacrificing guarantees, and influence analysis shows MoM-based methods reduce outlier impact. Comprehensive experiments on synthetic and real data demonstrate robust signal recovery and superior performance compared with existing methods, highlighting practical applicability to high-dimensional topological inference.

Abstract

The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this work, we develop a framework of statistical inference for persistent homology in the presence of outliers. Drawing inspiration from recent developments in robust statistics, we propose a \textit{median-of-means} variant of the distance function (\textsf{MoM Dist}) and establish its statistical properties. In particular, we show that, even in the presence of outliers, the sublevel filtrations and weighted filtrations induced by \textsf{MoM Dist} are both consistent estimators of the true underlying population counterpart and exhibit near minimax-optimal performance in adversarial settings. Finally, we demonstrate the advantages of the proposed methodology through simulations and applications.

Adversarially Robust Topological Inference

TL;DR

This work tackles the vulnerability of persistent homology to outliers by proposing a robust, scalable framework based on a median-of-means distance (MoM Dist). It introduces MoM-dist-based weighted filtrations, proves consistency and near-minimax rates for sublevel persistence diagrams under adversarial contamination, and establishes stability for the corresponding filtrations. An adaptive Lepski-based procedure selects the tuning parameter Q without sacrificing guarantees, and influence analysis shows MoM-based methods reduce outlier impact. Comprehensive experiments on synthetic and real data demonstrate robust signal recovery and superior performance compared with existing methods, highlighting practical applicability to high-dimensional topological inference.

Abstract

The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this work, we develop a framework of statistical inference for persistent homology in the presence of outliers. Drawing inspiration from recent developments in robust statistics, we propose a \textit{median-of-means} variant of the distance function (\textsf{MoM Dist}) and establish its statistical properties. In particular, we show that, even in the presence of outliers, the sublevel filtrations and weighted filtrations induced by \textsf{MoM Dist} are both consistent estimators of the true underlying population counterpart and exhibit near minimax-optimal performance in adversarial settings. Finally, we demonstrate the advantages of the proposed methodology through simulations and applications.
Paper Structure (24 sections, 12 theorems, 16 equations, 11 figures, 3 tables)

This paper contains 24 sections, 12 theorems, 16 equations, 11 figures, 3 tables.

Key Result

Proposition 2.1

For two compact sets ${\bX, \bY \subset \R^d}$, Furthermore, given two filter functions $f,g : \R^d \rightarrow \R$, Additionally, given $h: \bX \cup \bY \rightarrow \R_+$, if $h$ is $L$--Lipschitz and $\haus{\bX,\bY} \le \e$, then

Figures (11)

  • Figure 1: Informal illustration of interleaving of persistence modules. Here $s\le t$ and $t \le (\beta \circ \alpha)(s)$ for two nondecreasing maps $\alpha, \beta$. See \ref{['sec:interleaving']} for a formal definition.
  • Figure 2: Illustration of offsets for $t=0.5$ and $f(\xv) = \inf_{\yv \in \mathds{S}^1}\norm{\xv-\yv}$.
  • Figure 3: $\Xn$ with $n=620$ points from a Lemniscate with $m=80$ outliers. Illustration of the robust weighted filtrations with $p=1$ for $V^t[\Xn, \dnq]$, RKDE $V^t[\Xn, \fns]$, DTM $V^t[\Xn, \delta_{n,k}]$, and the k-PDTM $V^t[\mathbf{C}_N]$ filtration.
  • Figure 4: Comparison of Lepski's method and the heuristic procedure for selecting the parameter $Q$.
  • Figure 5: Robust persistence diagrams for interlocked circles in $\R^{100}$ using $\dnq$ and $\delta_{n, k}$ weighted filtrations.
  • ...and 6 more figures

Theorems & Definitions (26)

  • Definition 2.1: Distance function
  • Proposition 2.1: Stability of persistence diagrams
  • Theorem 3.1: Minimax lower bound
  • Definition 4.1
  • Remark 4.1
  • Remark 4.2
  • Lemma 4.1
  • Theorem 5.1: Sublevel filtration
  • Remark 5.1
  • Lemma 6.1: Regularity
  • ...and 16 more