Table of Contents
Fetching ...

BloomGML: Graph Machine Learning through the Lens of Bilevel Optimization

Amber Yijia Zheng, Tong He, Yixuan Qiu, Minjie Wang, David Wipf

TL;DR

This paper derives a more flexible class of energy functions that, when paired with various descent steps, form graph neural network message-passing layers that, when paired with various descent steps, form graph neural network (GNN) message-passing layers.

Abstract

Bilevel optimization refers to scenarios whereby the optimal solution of a lower-level energy function serves as input features to an upper-level objective of interest. These optimal features typically depend on tunable parameters of the lower-level energy in such a way that the entire bilevel pipeline can be trained end-to-end. Although not generally presented as such, this paper demonstrates how a variety of graph learning techniques can be recast as special cases of bilevel optimization or simplifications thereof. In brief, building on prior work we first derive a more flexible class of energy functions that, when paired with various descent steps (e.g., gradient descent, proximal methods, momentum, etc.), form graph neural network (GNN) message-passing layers; critically, we also carefully unpack where any residual approximation error lies with respect to the underlying constituent message-passing functions. We then probe several simplifications of this framework to derive close connections with non-GNN-based graph learning approaches, including knowledge graph embeddings, various forms of label propagation, and efficient graph-regularized MLP models. And finally, we present supporting empirical results that demonstrate the versatility of the proposed bilevel lens, which we refer to as BloomGML, referencing that BiLevel Optimization Offers More Graph Machine Learning. Our code is available at https://github.com/amberyzheng/BloomGML. Let graph ML bloom.

BloomGML: Graph Machine Learning through the Lens of Bilevel Optimization

TL;DR

This paper derives a more flexible class of energy functions that, when paired with various descent steps, form graph neural network message-passing layers that, when paired with various descent steps, form graph neural network (GNN) message-passing layers.

Abstract

Bilevel optimization refers to scenarios whereby the optimal solution of a lower-level energy function serves as input features to an upper-level objective of interest. These optimal features typically depend on tunable parameters of the lower-level energy in such a way that the entire bilevel pipeline can be trained end-to-end. Although not generally presented as such, this paper demonstrates how a variety of graph learning techniques can be recast as special cases of bilevel optimization or simplifications thereof. In brief, building on prior work we first derive a more flexible class of energy functions that, when paired with various descent steps (e.g., gradient descent, proximal methods, momentum, etc.), form graph neural network (GNN) message-passing layers; critically, we also carefully unpack where any residual approximation error lies with respect to the underlying constituent message-passing functions. We then probe several simplifications of this framework to derive close connections with non-GNN-based graph learning approaches, including knowledge graph embeddings, various forms of label propagation, and efficient graph-regularized MLP models. And finally, we present supporting empirical results that demonstrate the versatility of the proposed bilevel lens, which we refer to as BloomGML, referencing that BiLevel Optimization Offers More Graph Machine Learning. Our code is available at https://github.com/amberyzheng/BloomGML. Let graph ML bloom.
Paper Structure (51 sections, 2 theorems, 56 equations, 4 figures, 10 tables)

This paper contains 51 sections, 2 theorems, 56 equations, 4 figures, 10 tables.

Key Result

Proposition 1

For any $f_U \circ f_A \circ f_M$ adhering to Definition def:mpgnn_layer, there exists a canonical form $\widetilde{f}_U \circ \widetilde{f}_A \circ \widetilde{f}_M$ following Definition def:mpgnn_layer_canon that provides an arbitrarily close approximation.

Figures (4)

  • Figure 1: $\|{\mathbf{h}}^{(L)}_u - {\mathbf{h}}^{(L)}_v\|_2$ density for ${(u, v)\in{\mathcal{E}}}$ but with different labels in the Roman dataset.
  • Figure 2: $\ell_{low}$ values versus propagation steps on Cora.
  • Figure 3: Node classification accuracy involving heterogeneous graphs. BloomGML is modified from HALO to include robust regularization within $\kappa({\mathbf{h}}; {\mathbf{x}})$.
  • Figure 3: $\ell_{low}$ value versus the number of propagation steps in Roman dataset.

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Proposition 1
  • Definition 3
  • Proposition 2