Table of Contents
Fetching ...

Deep Graph Infomax

Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R Devon Hjelm

TL;DR

Deep Graph Infomax (DGI) tackles unsupervised learning of graph node representations by maximizing mutual information between local patch embeddings and a global graph summary. It connects graph convolutional encoders, a readout function, and a discriminator within a contrastive framework, using negative samples generated by corruption to train representations that reflect global graph structure. Empirically, DGI achieves competitive results across transductive and inductive node classification benchmarks, often rivaling supervised baselines and surpassing other unsupervised methods. The work demonstrates the viability of local-global MI as a scalable, general-purpose pretraining objective for graph-structured data and highlights robustness to corruption strategies and embedding depth.

Abstract

We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.

Deep Graph Infomax

TL;DR

Deep Graph Infomax (DGI) tackles unsupervised learning of graph node representations by maximizing mutual information between local patch embeddings and a global graph summary. It connects graph convolutional encoders, a readout function, and a discriminator within a contrastive framework, using negative samples generated by corruption to train representations that reflect global graph structure. Empirically, DGI achieves competitive results across transductive and inductive node classification benchmarks, often rivaling supervised baselines and surpassing other unsupervised methods. The work demonstrates the viability of local-global MI as a scalable, general-purpose pretraining objective for graph-structured data and highlights robustness to corruption strategies and embedding depth.

Abstract

We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.

Paper Structure

This paper contains 16 sections, 4 theorems, 9 equations, 8 figures, 2 tables.

Key Result

Lemma 1

Let $\{{\bf X}^{(k)}\}_{k = 1}^{|{\bf X}|}$ be a set of node representations drawn from an empirical probability distribution of graphs, $p({\bf X})$, with finite number of elements, $|{\bf X}|$, such that $p({\bf X}^{(k)}) = p({\bf X}^{(k')}) \ \forall k, k'$. Let $\mathcal{R}(\cdot)$ be a determin

Figures (8)

  • Figure 1: A high-level overview of Deep Graph Infomax. Refer to Section \ref{['sec:algo']} for more details.
  • Figure 2: The DGI setup on large graphs (such as Reddit). Summary vectors, $\vec{s}$, are obtained by combining several subsampled patch representations, $\vec{h}_i$ (here obtained by sampling three and two neighbors in the first and second level, respectively).
  • Figure 3: t-SNE embeddings of the nodes in the Cora dataset from the raw features ( left), features from a randomly initialized DGI model ( middle), and a learned DGI model ( right). The clusters of the learned DGI model's embeddings are clearly defined, with a Silhouette score of 0.234.
  • Figure 4: Discriminator scores, $\mathcal{D}\left(\vec{h}_i, \vec{s}\right)$, attributed to each node in the Cora dataset shown over a t-SNE of the DGI algorithm. Shown for both the original graph ( left) and a negative sample ( right).
  • Figure 5: The learnt embeddings of the highest-scored positive examples (upper half), and the lowest-scored negative examples (lower half).
  • ...and 3 more figures

Theorems & Definitions (8)

  • Lemma 1
  • proof
  • Corollary 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof