Table of Contents
Fetching ...

GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning

Weishuo Ma, Yanbo Wang, Xiyuan Wang, Lei Zou, Muhan Zhang

TL;DR

GILT tackles the heterogeneity of graph data by delivering an LLM-free, tuning-free in-context learning framework that tokenizes graph tasks into a fixed, graph-native representation. A graph-native tokenization phase converts arbitrary graphs into contextual tokens using a deep linear GCN and PCA-based feature alignment, followed by a two-stage ICL Transformer and a Prototypical Head to perform $N$-way $K$-shot classifications without gradient updates. Pre-training on 15 diverse graph datasets enables GILT to generalize across node, link, and graph tasks, achieving state-of-the-art few-shot performance with substantially faster inference than LLM-based or tuning-based baselines. This approach offers a practical, scalable, out-of-the-box graph foundation model capable of handling numerical and structural graph data without textual descriptions or per-task fine-tuning.

Abstract

Graph Neural Networks (GNNs) are powerful tools for precessing relational data but often struggle to generalize to unseen graphs, giving rise to the development of Graph Foundational Models (GFMs). However, current GFMs are challenged by the extreme heterogeneity of graph data, where each graph can possess a unique feature space, label set, and topology. To address this, two main paradigms have emerged. The first leverages Large Language Models (LLMs), but is fundamentally text-dependent, thus struggles to handle the numerical features in vast graphs. The second pre-trains a structure-based model, but the adaptation to new tasks typically requires a costly, per-graph tuning stage, creating a critical efficiency bottleneck. In this work, we move beyond these limitations and introduce \textbf{G}raph \textbf{I}n-context \textbf{L}earning \textbf{T}ransformer (GILT), a framework built on an LLM-free and tuning-free architecture. GILT introduces a novel token-based framework for in-context learning (ICL) on graphs, reframing classification tasks spanning node, edge and graph levels in a unified framework. This mechanism is the key to handling heterogeneity, as it is designed to operate on generic numerical features. Further, its ability to understand class semantics dynamically from the context enables tuning-free adaptation. Comprehensive experiments show that GILT achieves stronger few-shot performance with significantly less time than LLM-based or tuning-based baselines, validating the effectiveness of our approach.

GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning

TL;DR

GILT tackles the heterogeneity of graph data by delivering an LLM-free, tuning-free in-context learning framework that tokenizes graph tasks into a fixed, graph-native representation. A graph-native tokenization phase converts arbitrary graphs into contextual tokens using a deep linear GCN and PCA-based feature alignment, followed by a two-stage ICL Transformer and a Prototypical Head to perform -way -shot classifications without gradient updates. Pre-training on 15 diverse graph datasets enables GILT to generalize across node, link, and graph tasks, achieving state-of-the-art few-shot performance with substantially faster inference than LLM-based or tuning-based baselines. This approach offers a practical, scalable, out-of-the-box graph foundation model capable of handling numerical and structural graph data without textual descriptions or per-task fine-tuning.

Abstract

Graph Neural Networks (GNNs) are powerful tools for precessing relational data but often struggle to generalize to unseen graphs, giving rise to the development of Graph Foundational Models (GFMs). However, current GFMs are challenged by the extreme heterogeneity of graph data, where each graph can possess a unique feature space, label set, and topology. To address this, two main paradigms have emerged. The first leverages Large Language Models (LLMs), but is fundamentally text-dependent, thus struggles to handle the numerical features in vast graphs. The second pre-trains a structure-based model, but the adaptation to new tasks typically requires a costly, per-graph tuning stage, creating a critical efficiency bottleneck. In this work, we move beyond these limitations and introduce \textbf{G}raph \textbf{I}n-context \textbf{L}earning \textbf{T}ransformer (GILT), a framework built on an LLM-free and tuning-free architecture. GILT introduces a novel token-based framework for in-context learning (ICL) on graphs, reframing classification tasks spanning node, edge and graph levels in a unified framework. This mechanism is the key to handling heterogeneity, as it is designed to operate on generic numerical features. Further, its ability to understand class semantics dynamically from the context enables tuning-free adaptation. Comprehensive experiments show that GILT achieves stronger few-shot performance with significantly less time than LLM-based or tuning-based baselines, validating the effectiveness of our approach.

Paper Structure

This paper contains 33 sections, 5 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: GILT begins with a Graph-Native Tokenization module converting a few-shot task into unified tokens. This module first aligns feature dimensions, then uses a GNN to generate structure-aware embeddings. These embeddings are then combined with class prototypes to form the support and query tokens. The tokens are then passed to ICL Transformer, which features a two-stage attention mechanism for in-context reasoning and a Prototypical Head for the final classification.
  • Figure 2: Efficiency vs. Accuracy on Cora node classification. The y-axis is the total inference time (lower is better) and the point size is proportional to accuracy (larger is better). All models are 5-shot, except for the LLm-based zero-shot baselines.
  • Figure 3: The influence of the number of shots (K) on GILT's few-shot performance. The x-axis represents the number of support examples per class, and the y-axis represents the classification accuracy on the test set. Each line corresponds to a different dataset.