Table of Contents
Fetching ...

HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks

Yongyi Yang, Jiaming Yang, Wei Hu, Michał Dereziński

TL;DR

HerTA is proposed: a High-Efficiency and Rigorous Training Algorithm for Unfolded GNNs that accelerates the whole training process, achieving a nearly-linear time worst-case training guarantee and preserving the interpretability of Unfolded GNNs.

Abstract

As a variant of Graph Neural Networks (GNNs), Unfolded GNNs offer enhanced interpretability and flexibility over traditional designs. Nevertheless, they still suffer from scalability challenges when it comes to the training cost. Although many methods have been proposed to address the scalability issues, they mostly focus on per-iteration efficiency, without worst-case convergence guarantees. Moreover, those methods typically add components to or modify the original model, thus possibly breaking the interpretability of Unfolded GNNs. In this paper, we propose HERTA: a High-Efficiency and Rigorous Training Algorithm for Unfolded GNNs that accelerates the whole training process, achieving a nearly-linear time worst-case training guarantee. Crucially, HERTA converges to the optimum of the original model, thus preserving the interpretability of Unfolded GNNs. Additionally, as a byproduct of HERTA, we propose a new spectral sparsification method applicable to normalized and regularized graph Laplacians that ensures tighter bounds for our algorithm than existing spectral sparsifiers do. Experiments on real-world datasets verify the superiority of HERTA as well as its adaptability to various loss functions and optimizers.

HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks

TL;DR

HerTA is proposed: a High-Efficiency and Rigorous Training Algorithm for Unfolded GNNs that accelerates the whole training process, achieving a nearly-linear time worst-case training guarantee and preserving the interpretability of Unfolded GNNs.

Abstract

As a variant of Graph Neural Networks (GNNs), Unfolded GNNs offer enhanced interpretability and flexibility over traditional designs. Nevertheless, they still suffer from scalability challenges when it comes to the training cost. Although many methods have been proposed to address the scalability issues, they mostly focus on per-iteration efficiency, without worst-case convergence guarantees. Moreover, those methods typically add components to or modify the original model, thus possibly breaking the interpretability of Unfolded GNNs. In this paper, we propose HERTA: a High-Efficiency and Rigorous Training Algorithm for Unfolded GNNs that accelerates the whole training process, achieving a nearly-linear time worst-case training guarantee. Crucially, HERTA converges to the optimum of the original model, thus preserving the interpretability of Unfolded GNNs. Additionally, as a byproduct of HERTA, we propose a new spectral sparsification method applicable to normalized and regularized graph Laplacians that ensures tighter bounds for our algorithm than existing spectral sparsifiers do. Experiments on real-world datasets verify the superiority of HERTA as well as its adaptability to various loss functions and optimizers.
Paper Structure (45 sections, 14 theorems, 94 equations, 6 figures, 2 algorithms)

This paper contains 45 sections, 14 theorems, 94 equations, 6 figures, 2 algorithms.

Key Result

Theorem 1.1

HERTA solves the $\lambda$-regularized Unfolded GNN objective eq:bilevel-outer with $n$ nodes, $m$ edges and $d$-dimensional node features to within accuracy $\epsilon$ in time $\tilde{O}\left( (m+nd) \left( \log \frac{1}{\epsilon}\right)^2 + d^3\right)$ as long as the number of large eigenvalues of

Figures (6)

  • Figure 1: The training loss comparison between HERTA and standard optimizers on MSE loss with $\lambda = 1$. Dataset used from left to right: ogbn-arxiv, citeseer, pubmed.
  • Figure 2: The training loss comparison between HERTA and standard optimizers on cross entropy loss with $\lambda = 1$. Dataset used from left to right: ogbn-arxiv, citeseer and pubmed.
  • Figure 3: The training loss comparison between HERTA and standard optimizers on MSE loss with $\lambda = 20$. Dataset used from left to right: ogbn-arxiv, citeseer, pubmed.
  • Figure 4: The training loss comparison between HERTA and standard optimizers on CE loss with $\lambda = 20$. Dataset used from left to right: ogbn-arxiv, citeseer, pubmed.
  • Figure 5: The training loss comparison between HERTA and standard optimizers on Cora with $\lambda = 1$. Left: CE loss. Right: MSE loss.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Theorem 1.1: Informal version of \ref{['thm:main']}
  • Definition 5.1: Effective Laplacian dimension
  • Theorem 5.1: Main result
  • Definition 5.2: Linear solver
  • Lemma 5.1: Convergence
  • Lemma 5.2: Regularized spectral sparsifier
  • Lemma 5.3: Preconditioner
  • Lemma 5.4: Well-conditioned Hessian
  • Lemma 1.1: sdd-solver
  • Lemma 1.2: srht_tropp
  • ...and 10 more