Table of Contents
Fetching ...

Adaptive Tokenization: On the Hop-Overpriority Problem in Tokenized Graph Learning Models

Zhibiao Wang, Yunlong Zhou, Ziwei Zhang, Mengmei Zhang, Shirui Pan, Chunming Hu, Xiao Wang

TL;DR

This work identifies a hop-overpriority problem in pre-defined graph token lists used by tokenized graph learning models, where near-neighbor emphasis can drown global signals, especially on heterophilic graphs. It introduces LGTL, a learnable, plug-in module with a gate to reweight hops and a selection module to weigh within-hop nodes, enabling adaptive tokenization for both Graph Transformers and Graph LLMs. The authors provide theoretical analysis showing LGTL can address hop-overpriority and subsume fixed templates HO and ND, along with empirical results across text-attributed graphs and standard benchmarks demonstrating robust gains, particularly under heterophily. The findings suggest LGTL offers broad applicability and scalability for tokenized graph learning, improving performance while maintaining compatibility with existing backbones and extended token lists.

Abstract

Graph Transformers, leveraging the global attention to capture long-range dependencies in graph structures, have significantly advanced graph machine learning, but face prohibitive computational complexity. Tokenized Graph Learning Models (TGLMs) address this issue by converting graphs into ordered token lists for scalable processing. Besides, TGLMs also empower Large Language Models (LLMs) to handle text-attributed graphs more effectively and thus are also employed in Graph LLMs. However, existing TGLMs rely on hand-designed token lists and their adaptability to diverse graph learning scenarios remains unexplored. In this paper, we first conduct extensive empirical and theoretical preliminary studies for hand-designed token lists. Surprisingly, we identify an unexplored hop-overpriority problem: the common pre-defined token lists overemphasize nearby nodes and overwhelm the ability of TGLMs to balance local and global signals. This phenomenon is especially harmful for heterophilic graphs. To address this problem, we propose the Learnable Graph Token List (LGTL), a plug-and-play module to replace hand-designed token lists in TGLMs. Specifically, LGTL adaptively adjusts the weights across hops and prioritizes informative nodes within hops through a graph attention gate module and a selection module, respectively. In this way, contextually informative nodes can be adaptively emphasized for both homophilic and heterophilic graphs. Besides, we theoretically show that LGTL can address the hop-overpriority problem. Extensive experiments on benchmarks validate the efficacy of LGTL across both Graph Transformers and Graph LLM backbones.

Adaptive Tokenization: On the Hop-Overpriority Problem in Tokenized Graph Learning Models

TL;DR

This work identifies a hop-overpriority problem in pre-defined graph token lists used by tokenized graph learning models, where near-neighbor emphasis can drown global signals, especially on heterophilic graphs. It introduces LGTL, a learnable, plug-in module with a gate to reweight hops and a selection module to weigh within-hop nodes, enabling adaptive tokenization for both Graph Transformers and Graph LLMs. The authors provide theoretical analysis showing LGTL can address hop-overpriority and subsume fixed templates HO and ND, along with empirical results across text-attributed graphs and standard benchmarks demonstrating robust gains, particularly under heterophily. The findings suggest LGTL offers broad applicability and scalability for tokenized graph learning, improving performance while maintaining compatibility with existing backbones and extended token lists.

Abstract

Graph Transformers, leveraging the global attention to capture long-range dependencies in graph structures, have significantly advanced graph machine learning, but face prohibitive computational complexity. Tokenized Graph Learning Models (TGLMs) address this issue by converting graphs into ordered token lists for scalable processing. Besides, TGLMs also empower Large Language Models (LLMs) to handle text-attributed graphs more effectively and thus are also employed in Graph LLMs. However, existing TGLMs rely on hand-designed token lists and their adaptability to diverse graph learning scenarios remains unexplored. In this paper, we first conduct extensive empirical and theoretical preliminary studies for hand-designed token lists. Surprisingly, we identify an unexplored hop-overpriority problem: the common pre-defined token lists overemphasize nearby nodes and overwhelm the ability of TGLMs to balance local and global signals. This phenomenon is especially harmful for heterophilic graphs. To address this problem, we propose the Learnable Graph Token List (LGTL), a plug-and-play module to replace hand-designed token lists in TGLMs. Specifically, LGTL adaptively adjusts the weights across hops and prioritizes informative nodes within hops through a graph attention gate module and a selection module, respectively. In this way, contextually informative nodes can be adaptively emphasized for both homophilic and heterophilic graphs. Besides, we theoretically show that LGTL can address the hop-overpriority problem. Extensive experiments on benchmarks validate the efficacy of LGTL across both Graph Transformers and Graph LLM backbones.

Paper Structure

This paper contains 44 sections, 8 theorems, 54 equations, 5 figures, 7 tables.

Key Result

Theorem 4.1

$\mathbf{M}_{k,i}^{\text{HO}}$ follows the following rules:

Figures (5)

  • Figure 1: The average node-homophily for different types of nodes. "Template Better" means nodes which are predicted correctly by HO/ND but incorrectly by None, while "Template Worse" means nodes which are predicted incorrectly by HO/ND but correctly by None.
  • Figure 2: The overall framework of LGTL, including a gate module which learns hop scores from the central node's subgraph to rebalance attention and mitigate hop-overpriority problem, and a selection module which constructs hop subgraphs, computes within-hop node attention, and aggregates features into tokens. These tokens form a list input to TGLMs; raw attention scores are adjusted by hop weights to produce task-adaptive representations for homophilic and heterophilic graphs.
  • Figure 3: The analysis of the score by the gate module vs. the number of hops. "L", "N", and "V" indicates abbreviation for LLaGA, NAGphormer, and VCR-Graphormer, respectively.
  • Figure 4: The label-consistency of the selection module and node-homophily of hops. "L", "N", and "V" indicates abbreviation for LLaGA, NAGphormer, and VCR-Graphormer, respectively.
  • Figure 5: Examples demonstrating the interpretability of LGTL.

Theorems & Definitions (16)

  • Theorem 4.1: Recursive Properties of $\mathbf{M}_{k,i}^{\text{HO}}$
  • Theorem 4.2: Effective Attention Allocation
  • Theorem 4.3: Smoothness Bound of Tokenized Representations
  • Theorem 5.1
  • proof
  • Theorem 5.2
  • Proposition D.1: Contributions Relate to the Parity of Hop
  • Proposition D.2: Monotonic Decay of Row Contributions
  • Proposition D.3: Monotonic Decay of Column Contributions
  • proof
  • ...and 6 more