Table of Contents
Fetching ...

DAM-GT: Dual Positional Encoding-Based Attention Masking Graph Transformer for Node Classification

Chenyang Li, Jinsong Chen, John E. Hopcroft, Kun He

TL;DR

DAM-GT tackles two core issues in neighborhood-aware tokenized graph Transformers: failing to preserve attribute correlations within neighborhoods and attention-diverting interference in self-attention that overemphasizes high-hop tokens. It introduces dual positional encoding, combining attribute-aware clustering-based encoding with topology-based eigenvector cues, and a mask-aware self-attention mechanism to strengthen target–neighborhood interactions, implemented over Hop2Token-derived neighborhood tokens in a Transformer backbone. The approach yields state-of-the-art node classification performance across 12 datasets spanning various scales and homophily levels, with extensive ablations validating the usefulness of both components and propagation-step analysis showing scale-aware behavior. The work demonstrates strong scalability and practical impact for large graphs, providing a robust framework for integrating topology and semantic attribute information in graph Transformers.

Abstract

Neighborhood-aware tokenized graph Transformers have recently shown great potential for node classification tasks. Despite their effectiveness, our in-depth analysis of neighborhood tokens reveals two critical limitations in the existing paradigm. First, current neighborhood token generation methods fail to adequately capture attribute correlations within a neighborhood. Second, the conventional self-attention mechanism suffers from attention diversion when processing neighborhood tokens, where high-hop neighborhoods receive disproportionate focus, severely disrupting information interactions between the target node and its neighborhood tokens. To address these challenges, we propose DAM-GT, Dual positional encoding-based Attention Masking graph Transformer. DAM-GT introduces a novel dual positional encoding scheme that incorporates attribute-aware encoding via an attribute clustering strategy, effectively preserving node correlations in both topological and attribute spaces. In addition, DAM-GT formulates a new attention mechanism with a simple yet effective masking strategy to guide interactions between target nodes and their neighborhood tokens, overcoming the issue of attention diversion. Extensive experiments on various graphs with different homophily levels as well as different scales demonstrate that DAM-GT consistently outperforms state-of-the-art methods in node classification tasks.

DAM-GT: Dual Positional Encoding-Based Attention Masking Graph Transformer for Node Classification

TL;DR

DAM-GT tackles two core issues in neighborhood-aware tokenized graph Transformers: failing to preserve attribute correlations within neighborhoods and attention-diverting interference in self-attention that overemphasizes high-hop tokens. It introduces dual positional encoding, combining attribute-aware clustering-based encoding with topology-based eigenvector cues, and a mask-aware self-attention mechanism to strengthen target–neighborhood interactions, implemented over Hop2Token-derived neighborhood tokens in a Transformer backbone. The approach yields state-of-the-art node classification performance across 12 datasets spanning various scales and homophily levels, with extensive ablations validating the usefulness of both components and propagation-step analysis showing scale-aware behavior. The work demonstrates strong scalability and practical impact for large graphs, providing a robust framework for integrating topology and semantic attribute information in graph Transformers.

Abstract

Neighborhood-aware tokenized graph Transformers have recently shown great potential for node classification tasks. Despite their effectiveness, our in-depth analysis of neighborhood tokens reveals two critical limitations in the existing paradigm. First, current neighborhood token generation methods fail to adequately capture attribute correlations within a neighborhood. Second, the conventional self-attention mechanism suffers from attention diversion when processing neighborhood tokens, where high-hop neighborhoods receive disproportionate focus, severely disrupting information interactions between the target node and its neighborhood tokens. To address these challenges, we propose DAM-GT, Dual positional encoding-based Attention Masking graph Transformer. DAM-GT introduces a novel dual positional encoding scheme that incorporates attribute-aware encoding via an attribute clustering strategy, effectively preserving node correlations in both topological and attribute spaces. In addition, DAM-GT formulates a new attention mechanism with a simple yet effective masking strategy to guide interactions between target nodes and their neighborhood tokens, overcoming the issue of attention diversion. Extensive experiments on various graphs with different homophily levels as well as different scales demonstrate that DAM-GT consistently outperforms state-of-the-art methods in node classification tasks.

Paper Structure

This paper contains 34 sections, 16 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: The attention matrices on Photo and Reddit datasets. Deeper colors represent higher attention values.
  • Figure 2: The overall view of our DAM-GT. Given the input graph, DAM-GT first utilizes dual positional encoding to enhance original node features. Then the Hop2Token module is employed to generate neighborhood-aware token sequences as input for the Transformer-based backbone, where DAM-GT develops mask-aware self-attention mechanism to learn node representations. Finally, DAM-GT adopts a readout layer and a Multilayer perceptron for final label prediction.
  • Figure 3: Comparison results of DAM-GT and its variants without positional encoding.
  • Figure 4: Study on the propagation steps $S$.
  • Figure 5: The attention matrix of all four heads on Photo dataset in the backbone.
  • ...and 8 more figures