Table of Contents
Fetching ...

From Noisy to Native: LLM-driven Graph Restoration for Test-Time Graph Domain Adaptation

Xiangwei Lv, JinLuan Yang, Wang Lin, Jingyuan Chen, Beishui Liao

TL;DR

This work tackles Test-time Graph Domain Adaptation under source-data restrictions by introducing GRAIL, a framework that treats graph restoration as a generative process guided by a large language model (LLM). A graph diffusion trajectory tokenizer converts graph structure into discrete tokens, enabling LLM fine-tuning to learn autoregressive graph restoration, while reinforcement learning with alignment and confidence rewards pushes refinements toward source-domain characteristics without accessing the source data. The approach combines a Q-Former encoder, diffusion modeling, vector quantization, and an LLM-based restorer, followed by GRPO-based post-training to balance source alignment and downstream confidence. Experiments on ACMv9, Citationv1, and DBLP demonstrate consistent improvements over strong TT-GDA baselines, validating the feasibility and practical impact of source-free graph restoration for domain adaptation.

Abstract

Graph domain adaptation (GDA) has achieved great attention due to its effectiveness in addressing the domain shift between train and test data. A significant bottleneck in existing graph domain adaptation methods is their reliance on source-domain data, which is often unavailable due to privacy or security concerns. This limitation has driven the development of Test-Time Graph Domain Adaptation (TT-GDA), which aims to transfer knowledge without accessing the source examples. Inspired by the generative power of large language models (LLMs), we introduce a novel framework that reframes TT-GDA as a generative graph restoration problem, "restoring the target graph to its pristine, source-domain-like state". There are two key challenges: (1) We need to construct a reasonable graph restoration process and design an effective encoding scheme that an LLM can understand, bridging the modality gap. (2) We need to devise a mechanism to ensure the restored graph acquires the intrinsic features of the source domain, even without access to the source data. To ensure the effectiveness of graph restoration, we propose GRAIL, that restores the target graph into a state that is well-aligned with the source domain. Specifically, we first compress the node representations into compact latent features and then use a graph diffusion process to model the graph restoration process. Then a quantization module encodes the restored features into discrete tokens. Building on this, an LLM is fine-tuned as a generative restorer to transform a "noisy" target graph into a "native" one. To further improve restoration quality, we introduce a reinforcement learning process guided by specialized alignment and confidence rewards. Extensive experiments demonstrate the effectiveness of our approach across various datasets.

From Noisy to Native: LLM-driven Graph Restoration for Test-Time Graph Domain Adaptation

TL;DR

This work tackles Test-time Graph Domain Adaptation under source-data restrictions by introducing GRAIL, a framework that treats graph restoration as a generative process guided by a large language model (LLM). A graph diffusion trajectory tokenizer converts graph structure into discrete tokens, enabling LLM fine-tuning to learn autoregressive graph restoration, while reinforcement learning with alignment and confidence rewards pushes refinements toward source-domain characteristics without accessing the source data. The approach combines a Q-Former encoder, diffusion modeling, vector quantization, and an LLM-based restorer, followed by GRPO-based post-training to balance source alignment and downstream confidence. Experiments on ACMv9, Citationv1, and DBLP demonstrate consistent improvements over strong TT-GDA baselines, validating the feasibility and practical impact of source-free graph restoration for domain adaptation.

Abstract

Graph domain adaptation (GDA) has achieved great attention due to its effectiveness in addressing the domain shift between train and test data. A significant bottleneck in existing graph domain adaptation methods is their reliance on source-domain data, which is often unavailable due to privacy or security concerns. This limitation has driven the development of Test-Time Graph Domain Adaptation (TT-GDA), which aims to transfer knowledge without accessing the source examples. Inspired by the generative power of large language models (LLMs), we introduce a novel framework that reframes TT-GDA as a generative graph restoration problem, "restoring the target graph to its pristine, source-domain-like state". There are two key challenges: (1) We need to construct a reasonable graph restoration process and design an effective encoding scheme that an LLM can understand, bridging the modality gap. (2) We need to devise a mechanism to ensure the restored graph acquires the intrinsic features of the source domain, even without access to the source data. To ensure the effectiveness of graph restoration, we propose GRAIL, that restores the target graph into a state that is well-aligned with the source domain. Specifically, we first compress the node representations into compact latent features and then use a graph diffusion process to model the graph restoration process. Then a quantization module encodes the restored features into discrete tokens. Building on this, an LLM is fine-tuned as a generative restorer to transform a "noisy" target graph into a "native" one. To further improve restoration quality, we introduce a reinforcement learning process guided by specialized alignment and confidence rewards. Extensive experiments demonstrate the effectiveness of our approach across various datasets.

Paper Structure

This paper contains 19 sections, 21 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The architecture of GRAIL. (a) Graph Diffusion Trajectory Tokenizer utilizes a graph diffusion process to create a source graph restoration trajectory at the embedding level. The trajectory is subsequently tokenized into discrete token IDs. An additional decoder then reconstructs the graph from these tokenized ids. (b) SFT on Graph Tokens uses the tokenized sequences from stage (a) to fine-tune an LLM, which enables the LLM to understand and model the graph restoration process in an autoregressive manner. (c) Post-training for Alignmnet further enhances the LLM's restoration capabilities by incorporating a reinforcement learning process, which leverages alignment and confidence rewards to refine target graphs with source characteristics.
  • Figure 2: Ablation studies on C $\Rightarrow$ A and A $\Rightarrow$ C.
  • Figure 3: Visualization of node embeddings on the C $\Rightarrow$ A domain adaptation task. (a) and (b) show the node distributions colored by their class labels before and after refining target subgraph, respectively. (c) and (d) show the node distributions colored by their domain type (source or target) before and after refining.
  • Figure 4: Micro-F1 of various weight parameters.