Table of Contents
Fetching ...

Stronger Graph Transformer with Regularized Attention Scores

Eugene Ku

TL;DR

A novel version of edge regularization technique is proposed that alleviates the need for Positional Encoding and ultimately alleviate GT's out of memory issue.

Abstract

Graph Neural Networks are notorious for its memory consumption. A recent Transformer-based GNN called Graph Transformer is shown to obtain superior performances when long range dependencies exist. However, combining graph data and Transformer architecture led to a combinationally worse memory issue. We propose a novel version of "edge regularization technique" that alleviates the need for Positional Encoding and ultimately alleviate GT's out of memory issue. We observe that it is not clear whether having an edge regularization on top of positional encoding is helpful. However, it seems evident that applying our edge regularization technique indeed stably improves GT's performance compared to GT without Positional Encoding.

Stronger Graph Transformer with Regularized Attention Scores

TL;DR

A novel version of edge regularization technique is proposed that alleviates the need for Positional Encoding and ultimately alleviate GT's out of memory issue.

Abstract

Graph Neural Networks are notorious for its memory consumption. A recent Transformer-based GNN called Graph Transformer is shown to obtain superior performances when long range dependencies exist. However, combining graph data and Transformer architecture led to a combinationally worse memory issue. We propose a novel version of "edge regularization technique" that alleviates the need for Positional Encoding and ultimately alleviate GT's out of memory issue. We observe that it is not clear whether having an edge regularization on top of positional encoding is helpful. However, it seems evident that applying our edge regularization technique indeed stably improves GT's performance compared to GT without Positional Encoding.
Paper Structure (13 sections, 2 equations, 8 figures, 1 table)

This paper contains 13 sections, 2 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Oversquashing [2]
  • Figure 2: Superior Performance of models with Graph Transformers on LRGB[6]
  • Figure 3: RWSE allows expressiveness beyond Color Refinement Algorithm [2]
  • Figure 4: RWSE does not guaranteed to show unique encoding for each node[2]
  • Figure 5: LRGB Dataset description [6]
  • ...and 3 more figures