Table of Contents
Fetching ...

TopoGCL: Topological Graph Contrastive Learning

Yuzhou Chen, Jose Frias, Yulia R. Gel

TL;DR

This work proposes a new contrastive mode which targets topological representations of the two augmented views from the same graph, yielded by extracting latent shape properties of the graph at multiple resolutions, and introduces a new extended persistence summary, namely, extended persistence landscapes (EPL) and derive its theoretical stability guarantees.

Abstract

Graph contrastive learning (GCL) has recently emerged as a new concept which allows for capitalizing on the strengths of graph neural networks (GNNs) to learn rich representations in a wide variety of applications which involve abundant unlabeled information. However, existing GCL approaches largely tend to overlook the important latent information on higher-order graph substructures. We address this limitation by introducing the concepts of topological invariance and extended persistence on graphs to GCL. In particular, we propose a new contrastive mode which targets topological representations of the two augmented views from the same graph, yielded by extracting latent shape properties of the graph at multiple resolutions. Along with the extended topological layer, we introduce a new extended persistence summary, namely, extended persistence landscapes (EPL) and derive its theoretical stability guarantees. Our extensive numerical results on biological, chemical, and social interaction graphs show that the new Topological Graph Contrastive Learning (TopoGCL) model delivers significant performance gains in unsupervised graph classification for 11 out of 12 considered datasets and also exhibits robustness under noisy scenarios.

TopoGCL: Topological Graph Contrastive Learning

TL;DR

This work proposes a new contrastive mode which targets topological representations of the two augmented views from the same graph, yielded by extracting latent shape properties of the graph at multiple resolutions, and introduces a new extended persistence summary, namely, extended persistence landscapes (EPL) and derive its theoretical stability guarantees.

Abstract

Graph contrastive learning (GCL) has recently emerged as a new concept which allows for capitalizing on the strengths of graph neural networks (GNNs) to learn rich representations in a wide variety of applications which involve abundant unlabeled information. However, existing GCL approaches largely tend to overlook the important latent information on higher-order graph substructures. We address this limitation by introducing the concepts of topological invariance and extended persistence on graphs to GCL. In particular, we propose a new contrastive mode which targets topological representations of the two augmented views from the same graph, yielded by extracting latent shape properties of the graph at multiple resolutions. Along with the extended topological layer, we introduce a new extended persistence summary, namely, extended persistence landscapes (EPL) and derive its theoretical stability guarantees. Our extensive numerical results on biological, chemical, and social interaction graphs show that the new Topological Graph Contrastive Learning (TopoGCL) model delivers significant performance gains in unsupervised graph classification for 11 out of 12 considered datasets and also exhibits robustness under noisy scenarios.

Paper Structure

This paper contains 19 sections, 1 theorem, 15 equations, 4 figures, 8 tables.

Key Result

Proposition 4.3

Let ${\text{EDg}}_1$ and ${\text{EDg}}_2$ be EPDs for the piecewise linear functions $f,g:\mathscr{K}\to \mathbb{R}$ respectively, then their corresponding $\infty$-landscape distance satisfies

Figures (4)

  • Figure 1: The overall architecture of TopoGCL. TopoGCL consists of 4 components: (I) Calculate an extended topological feature $\Tilde{\boldsymbol{\Xi}}_i$ from the perturbed graph $\mathcal{G}_i$ and then feed $\Tilde{\boldsymbol{\Xi}}_i$ into the the extended topological layer (ETL) $\Psi(\cdot)$ and obtain the latent extended topological representation $\Tilde{\boldsymbol{Z}}_i$. (II) Feed $\mathcal{G}_i$ into the GNN encoder $f_{\text{ENCODER}}$ and generate the node embeddings $\Tilde{\boldsymbol{H}}_i$. (III) Feed $\mathcal{G}^\prime_i$ into the GNN encoder $f_{\text{ENCODER}}$ and generate the node embeddings $\Tilde{\boldsymbol{H}}^\prime_i$. (IV) Calculate an extended topological feature $\Tilde{\boldsymbol{\Xi}}^\prime_i$ from the perturbed graph $\mathcal{G}^\prime_i$ and then feed $\Tilde{\boldsymbol{\Xi}}^\prime_i$ into the the extended topological layer (ETL) $\Psi(\cdot)$ and obtain the latent extended topological representation $\Tilde{\boldsymbol{Z}}^\prime_i$. After that, apply contrastive loss functions (i.e., Equations 1 and 5) to $\{\Tilde{\boldsymbol{H}}_i, \Tilde{\boldsymbol{H}}^\prime_i\}$ and $\{\Tilde{\boldsymbol{Z}}_i, \Tilde{\boldsymbol{Z}}^\prime_i\}$ respectively and obtain two contrastive losses. Finally, combine two contrastive losses via $\ell = \alpha \times \sum^{\Upsilon}_{i=1}\ell_{i, \text{G}} + \beta \times \sum^{\Upsilon}_{i=1}\ell_{i,\text{T}}$.
  • Figure 2: Comparison between traditional persistence and extended persistence on a graph.
  • Figure 3: Extended persistence landscape.
  • Figure 4: Proteins structures and EPDs in PROTEINS.

Theorems & Definitions (5)

  • Definition 4.1: Extended Persistence Landscape
  • Definition 4.2: Distances between EPLs
  • Proposition 4.3: Stability of EPL
  • Remark 4.4: On theoretical properties of EPI and relationships to EPL
  • proof