Test-Time Training on Graphs with Large Language Models (LLMs)

Jiaxin Zhang; Yiqi Wang; Xihong Yang; Siwei Wang; Yu Feng; Yu Shi; Ruicaho Ren; En Zhu; Xinwang Liu

Test-Time Training on Graphs with Large Language Models (LLMs)

Jiaxin Zhang, Yiqi Wang, Xihong Yang, Siwei Wang, Yu Feng, Yu Shi, Ruicaho Ren, En Zhu, Xinwang Liu

TL;DR

This paper tackles Out-of-Distribution generalization in graph neural networks by introducing LLMTTT, a test-time training framework that uses Large Language Models as annotators to provide a small, carefully selected set of pseudo labels for graph node classification. The method combines a hybrid active node selection strategy (uncertainty-based and distribution-based) with a two-stage training process that first mitigates noise from limited labels and then leverages unlabeled data through self-training, all under a constrained annotation budget. The authors provide theoretical bounds showing LLMTTT can reduce test-domain error relative to traditional unsupervised TTT, and demonstrate strong empirical gains across multiple text-attributed and open graph datasets, validating the practical potential of language-model-assisted test-time adaptation for OOD graph learning. Overall, LLMTTT offers a scalable, model-agnostic approach that leverages LLM annotation to enable flexible, real-time adaptation to distribution shifts in graph data.

Abstract

Graph Neural Networks have demonstrated great success in various fields of multimedia. However, the distribution shift between the training and test data challenges the effectiveness of GNNs. To mitigate this challenge, Test-Time Training (TTT) has been proposed as a promising approach. Traditional TTT methods require a demanding unsupervised training strategy to capture the information from test to benefit the main task. Inspired by the great annotation ability of Large Language Models (LLMs) on Text-Attributed Graphs (TAGs), we propose to enhance the test-time training on graphs with LLMs as annotators. In this paper, we design a novel Test-Time Training pipeline, LLMTTT, which conducts the test-time adaptation under the annotations by LLMs on a carefully-selected node set. Specifically, LLMTTT introduces a hybrid active node selection strategy that considers not only node diversity and representativeness, but also prediction signals from the pre-trained model. Given annotations from LLMs, a two-stage training strategy is designed to tailor the test-time model with the limited and noisy labels. A theoretical analysis ensures the validity of our method and extensive experiments demonstrate that the proposed LLMTTT can achieve a significant performance improvement compared to existing Out-of-Distribution (OOD) generalization methods.

Test-Time Training on Graphs with Large Language Models (LLMs)

TL;DR

Abstract

Paper Structure (39 sections, 6 theorems, 54 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 39 sections, 6 theorems, 54 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Preliminary
Method
An Overview of LLMTTT
Hybrid Active Node Selection
uncertainty-based active learning
distribution-based active learning
The Selection Algorithm
Confidence-aware High-quality Annotation
Two-Stage Training
Stage 1: Training with filtered nodes
Stage 2: self-training with unlabeled nodes
Theoretical Analysis
Experiment
Experimental Settings
...and 24 more sections

Key Result

Theorem 1

Considering data domains $X_s,~X_t$, let $S_i$ represent unlabeled samples of size $m_i$ sampled from each of the two domains respectively. The total number of samples in $X_{train}$ is $N$, with a sample number ratio of $\mathbf{\boldsymbol \lambda} = (\lambda_0, \lambda_1)$ in each component. If $ where $C=2\sqrt{\left(\sum_{i=0}^{1} \frac{\omega_{i}^{2}}{\lambda_{i}}\right)\left(\frac{d \log (2

Figures (4)

Figure 1: The overall framework of LLMTTT.
Figure 2: Investigation on how different LLM accuracy affect the performance of LLMTTT . "random" means the random-based selection. "pagerank" means the pagerank-based selection.
Figure 3: The results of different post-filtering strategies. "none" means graph active selection combined without post-filtering. "conf_only" means the graph active selection combined with confidence. "conf_COE" means the graph active selection combined with confidence and COE.
Figure 4: Effectiveness of two-stage training

Theorems & Definitions (6)

Theorem 1
Theorem 2
Lemma 1
Lemma 2
Theorem 1
Theorem 2

Test-Time Training on Graphs with Large Language Models (LLMs)

TL;DR

Abstract

Test-Time Training on Graphs with Large Language Models (LLMs)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)