Table of Contents
Fetching ...

Persistent reachability homology in machine learning applications

Luigi Caputi, Nicholas Meadows, Henri Riihimäki

TL;DR

The paper investigates persistent reachability homology (PRH) as a condensation-based, topology-driven feature extractor for directed graphs and applies it to epileptic seizure detection from EEG-derived networks. PRH computes homology on the reachability poset obtained after condensing strongly connected components, linking to Hochschild cohomology via $\mathrm{HH}^i(k\mathcal{R}(G))$ and enabling faster computations than the directed flag complex. Through a pipeline using Betti curves and their integrals as features in support vector machines, PRH generally outperforms the directed flag complex approach in 7 of 8 model comparisons, with best accuracies around $82\%$ on a dataset of 100 recordings from 16 patients. The results highlight that PRH captures complementary structural information and that combining multiple homology theories can enhance TDA-based machine learning on digraph-structured data.

Abstract

We explore the recently introduced persistent reachability homology (PRH) of digraph data, i.e. data in the form of directed graphs. In particular, we study the effectiveness of PRH in network classification task in a key neuroscience problem: epilepsy detection. PRH is a variation of the persistent homology of digraphs, more traditionally based on the directed flag complex (DPH). A main advantage of PRH is that it considers the condensations of the digraphs appearing in the persistent filtration and thus is computed from smaller digraphs. We compare the effectiveness of PRH to that of DPH and we show that PRH outperforms DPH in the classification task. We use the Betti curves and their integrals as topological features and implement our pipeline on support vector machine.

Persistent reachability homology in machine learning applications

TL;DR

The paper investigates persistent reachability homology (PRH) as a condensation-based, topology-driven feature extractor for directed graphs and applies it to epileptic seizure detection from EEG-derived networks. PRH computes homology on the reachability poset obtained after condensing strongly connected components, linking to Hochschild cohomology via and enabling faster computations than the directed flag complex. Through a pipeline using Betti curves and their integrals as features in support vector machines, PRH generally outperforms the directed flag complex approach in 7 of 8 model comparisons, with best accuracies around on a dataset of 100 recordings from 16 patients. The results highlight that PRH captures complementary structural information and that combining multiple homology theories can enhance TDA-based machine learning on digraph-structured data.

Abstract

We explore the recently introduced persistent reachability homology (PRH) of digraph data, i.e. data in the form of directed graphs. In particular, we study the effectiveness of PRH in network classification task in a key neuroscience problem: epilepsy detection. PRH is a variation of the persistent homology of digraphs, more traditionally based on the directed flag complex (DPH). A main advantage of PRH is that it considers the condensations of the digraphs appearing in the persistent filtration and thus is computed from smaller digraphs. We compare the effectiveness of PRH to that of DPH and we show that PRH outperforms DPH in the classification task. We use the Betti curves and their integrals as topological features and implement our pipeline on support vector machine.

Paper Structure

This paper contains 10 sections, 4 theorems, 28 equations, 5 figures, 1 table.

Key Result

Proposition 1.6

Let $P$ be a poset and $G(P)$ its underlying directed graph. Then, we have for all $i\in \mathbb{N}$.

Figures (5)

  • Figure 1: The mean Betti numbers 0, 1, and 2 with respect to the edge probability $p$ over 200 realisations of the directed flag complex of the Erdõs-Rényi random digraph $G(100,p)$.
  • Figure 2: The mean reachability Betti numbers 1 and 2 with respect to the edge probability $p$ over 300 realisations of the Erdõs-Rényi random digraph $G(100,p)$. Note that the range of $p$ is much smaller than in \ref{['fig:raw_ER_bettis']}, showing that the reachability homology is confined into a very small range.
  • Figure 3: Classification accuracies of SVM with linear kernel in using the Betti number feature vectors (top) and Betti integral feature vectors (bottom). The $x$-axis is the lower threshold values used to initially prune the graphs. Each threshold value is replicated three times corresponding to 2-, 3-, and 5-fold crossvalidations, in order. The vertical middle line visually separates the results into left half, where only Betti numbers 0 and 1 are used, and right half where Betti numbers 0, 1, and 2 are used.
  • Figure 4: Classification accuracies of SVM with RBF kernel in using the Betti number feature vectors (top) and Betti integral feature vectors (bottom). The $x$-axis is the threshold values used to initially prune the graphs. Each threshold value is replicated three times corresponding to 2-, 3-, and 5-fold crossvalidations, in order. The vertical middle line visually separates the results into left half, where only Betti numbers 0 and 1 are used, and right half where Betti numbers 0, 1, and 2 are used.
  • Figure 5: Feature importance in the linear SVM as the appearance frequency of the most important Betti number features, normalised by the number of classification runs. Features 1-11 refer to $\beta_0$ of the 11 filtration steps in the filtration order, similarly features 12-22 are $\beta_1$ and features 23-33 are $\beta_2$.

Theorems & Definitions (12)

  • Definition 1.1
  • Definition 1.2
  • Example 1.3
  • Example 1.4
  • Example 1.5
  • Proposition 1.6
  • Definition 1.7
  • Definition 1.8
  • Theorem 1.9
  • Proposition 1.10
  • ...and 2 more