Table of Contents
Fetching ...

GEFL: Extended Filtration Learning for Graph Classification

Simon Zhang, Soham Mukherjee, Tamal K. Dey

TL;DR

It is shown that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information.

Abstract

Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a barcode with four different types of bars and their explicit cycle representatives, is combined into the model by the readout function which is computed by extended persistence. The entire model is end-to-end differentiable. We use a link-cut tree data structure and parallelism to lower the complexity of computing extended persistence, obtaining a speedup of more than 60x over the state-of-the-art for extended persistence computation. This makes extended persistence feasible for machine learning. We show that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information. In particular, arbitrarily long cycles can be represented, which is difficult for finite receptive field message passing graph neural networks. Furthermore, we show the effectiveness of our method on real world datasets compared to many existing recent graph representation learning methods.

GEFL: Extended Filtration Learning for Graph Classification

TL;DR

It is shown that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information.

Abstract

Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a barcode with four different types of bars and their explicit cycle representatives, is combined into the model by the readout function which is computed by extended persistence. The entire model is end-to-end differentiable. We use a link-cut tree data structure and parallelism to lower the complexity of computing extended persistence, obtaining a speedup of more than 60x over the state-of-the-art for extended persistence computation. This makes extended persistence feasible for machine learning. We show that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information. In particular, arbitrarily long cycles can be represented, which is difficult for finite receptive field message passing graph neural networks. Furthermore, we show the effectiveness of our method on real world datasets compared to many existing recent graph representation learning methods.
Paper Structure (32 sections, 4 theorems, 5 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 32 sections, 4 theorems, 5 equations, 13 figures, 5 tables, 2 algorithms.

Key Result

Theorem 5.1

(Extended Barcode Properties) $\mathbf{PH_{ext}}(G)$ produces four multisets of bars: $\mathcal{B}^{ext}_1, \mathcal{B}_{0}^{ext}, \mathcal{B}_{0}^{low}, \mathcal{B}_{0}^{up}$, s.t. $|\mathcal{B}_1^{ext}|= \mathrm{dim}\, H_1=m-n+C$, $|\mathcal{B}_{0}^{ext}|=\mathrm{dim}\,H_{0}=C$, $|\mathcal{B}_{0}^

Figures (13)

  • Figure 1: Lower and upper filtrations for extended persistence and the resulting barcode for a graph. The green bar comes from a pairing of a green edge with a vertex in the lower filtration. Similarily the blue bar in the upper filtration comes from a vertex-edge pairing in the upper filtration. The two dark blue bars count connected components and come from pairs of two vertices. The two red bars count cycles and come from pairs of edges. Both $\mathcal{B}_0^{ext}$ and $\mathcal{B}_1^{ext}$ bars cross from the lower filtration to the upper filtration. The multiset of bars forms the barcode. Cycle reps. are shown in both filtrations.
  • Figure 2: The extended persistence architecture (bars+cycles) for graph representation learning. The negative log likelihood (NLL) loss is used for supervised classification. The yellow arrow denotes extended persistence computation, which can compute both barcodes and cycle representatives.
  • Figure 3: Class 0: 2 triangles with pinwheel at each vertex.
  • Figure 4: Class 1: A hexagon with pinwheel at each vertex.
  • Figure 5: Class 0: A 15 node cycle and an 85 node cycle.
  • ...and 8 more figures

Theorems & Definitions (9)

  • Theorem 5.1
  • Corollary 5.5
  • Theorem A.1
  • proof
  • proof
  • proof
  • proof
  • Corollary A.5
  • proof