Table of Contents
Fetching ...

GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs

Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, Dit-Yan Yeung

TL;DR

The paper introduces GaAN, a gated attention network that assigns learnable gates to each attention head in graph aggregators, enabling selective use of information from neighbors. It provides a unified framework to convert graph aggregators into graph recurrent units, illustrated by the Graph GRU (GGRU) for spatiotemporal forecasting. Empirical results on inductive node classification (PPI, Reddit) and traffic speed forecasting (METR-LA) show state-of-the-art performance, with ablations confirming the benefit of head gates and sampling strategies. The work offers a scalable, flexible approach for both static and dynamic graph tasks, with potential extensions to edge features and NLP applications.

Abstract

We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multi-head attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional sub-network to control each attention head's importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three real-world datasets show that our GaAN framework achieves state-of-the-art results on both tasks.

GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs

TL;DR

The paper introduces GaAN, a gated attention network that assigns learnable gates to each attention head in graph aggregators, enabling selective use of information from neighbors. It provides a unified framework to convert graph aggregators into graph recurrent units, illustrated by the Graph GRU (GGRU) for spatiotemporal forecasting. Empirical results on inductive node classification (PPI, Reddit) and traffic speed forecasting (METR-LA) show state-of-the-art performance, with ablations confirming the benefit of head gates and sampling strategies. The work offers a scalable, flexible approach for both static and dynamic graph tasks, with potential extensions to edge features and NLP applications.

Abstract

We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multi-head attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional sub-network to control each attention head's importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three real-world datasets show that our GaAN framework achieves state-of-the-art results on both tasks.

Paper Structure

This paper contains 18 sections, 6 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Illustration of a three-head gated attention aggregator with two center nodes in a mini-batch. $|\mathcal{N}_1|=3$ and $|\mathcal{N}_2|=2$ respectively. Different colors indicate different attention heads. Gates in darker color stands for larger values. (Best viewed in color)
  • Figure 2: Comparison of different graph aggregators. The aggregators are drawn for only one aggregation step. The nodes in red are center nodes and the nodes in blue are neighboring nodes. The bold black lines between the center node and neighbor nodes indicate that a learned pairwise relationship is used for calculating the relative importance. The oval in dash line around the neighbors means the interaction among neighbors is utilized when determining the weights. (Best viewed in color)
  • Figure 3: Ablation analysis on PPI and Reddit
  • Figure 4: Illustration of the encoder-decoder structure used in the paper. We use two layers of Graph GRUs to predict a length-3 output sequence based on a length-2 input sequence. 'SS' denotes the scheduled sampling step.