Table of Contents
Fetching ...

Are LLMs Better GNN Helpers? Rethinking Robust Graph Learning under Deficiencies with Iterative Refinement

Zhaoyan Wang, Zheng Gao, Arogya Kharel, In-Young Ko

TL;DR

This paper investigates robustness of GNNs under compound deficiencies in text-attributed graphs and questions the blanket superiority of LLM-based augmentations. It conducts a comprehensive benchmark comparing conventional GNNs and LLM-enhanced approaches, revealing that LLMs can underperform under modest perturbations and suffer from semantic homogeneity. To address this, the authors introduce RoGRAD, an iterative Retrieval-Augmented Graph Learning framework comprising Semantic-Guided Generation (SGGM), Graph Enrichment, and Retrieval-Refined Contrastive Learning (R2CL). RoGRAD leverages retrieval-grounded augmentations and multi-round refinement to produce more discriminative, robust node representations, achieving up to 82.43% average improvement over baselines. The work also presents R2CL, a contrastive learning scheme that benefits from LLM-refined views, providing a principled path to robust graph learning in real-world deficient data settings.

Abstract

Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data, such as text-attributed graphs. Yet in real-world scenarios, such graphs exhibit deficiencies that substantially undermine GNN performance. While prior GNN-based augmentation studies have explored robustness against individual imperfections, a systematic understanding of how graph-native and Large Language Models (LLMs) enhanced methods behave under compound deficiencies is still missing. Specifically, there has been no comprehensive investigation comparing conventional approaches and recent LLM-on-graph frameworks, leaving their merits unclear. To fill this gap, we conduct the first empirical study that benchmarks these two lines of methods across diverse graph deficiencies, revealing overlooked vulnerabilities and challenging the assumption that LLM augmentation is consistently superior. Building on empirical findings, we propose Robust Graph Learning via Retrieval-Augmented Contrastive Refinement (RoGRAD) framework. Unlike prior one-shot LLM-as-Enhancer designs, RoGRAD is the first iterative paradigm that leverages Retrieval-Augmented Generation (RAG) to inject retrieval-grounded augmentations by supplying class-consistent, diverse augmentations and enforcing discriminative representations through iterative graph contrastive learning. It transforms LLM augmentation for graphs from static signal injection into dynamic refinement. Extensive experiments demonstrate RoGRAD's superiority over both conventional GNN- and LLM-enhanced baselines, achieving up to 82.43% average improvement.

Are LLMs Better GNN Helpers? Rethinking Robust Graph Learning under Deficiencies with Iterative Refinement

TL;DR

This paper investigates robustness of GNNs under compound deficiencies in text-attributed graphs and questions the blanket superiority of LLM-based augmentations. It conducts a comprehensive benchmark comparing conventional GNNs and LLM-enhanced approaches, revealing that LLMs can underperform under modest perturbations and suffer from semantic homogeneity. To address this, the authors introduce RoGRAD, an iterative Retrieval-Augmented Graph Learning framework comprising Semantic-Guided Generation (SGGM), Graph Enrichment, and Retrieval-Refined Contrastive Learning (R2CL). RoGRAD leverages retrieval-grounded augmentations and multi-round refinement to produce more discriminative, robust node representations, achieving up to 82.43% average improvement over baselines. The work also presents R2CL, a contrastive learning scheme that benefits from LLM-refined views, providing a principled path to robust graph learning in real-world deficient data settings.

Abstract

Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data, such as text-attributed graphs. Yet in real-world scenarios, such graphs exhibit deficiencies that substantially undermine GNN performance. While prior GNN-based augmentation studies have explored robustness against individual imperfections, a systematic understanding of how graph-native and Large Language Models (LLMs) enhanced methods behave under compound deficiencies is still missing. Specifically, there has been no comprehensive investigation comparing conventional approaches and recent LLM-on-graph frameworks, leaving their merits unclear. To fill this gap, we conduct the first empirical study that benchmarks these two lines of methods across diverse graph deficiencies, revealing overlooked vulnerabilities and challenging the assumption that LLM augmentation is consistently superior. Building on empirical findings, we propose Robust Graph Learning via Retrieval-Augmented Contrastive Refinement (RoGRAD) framework. Unlike prior one-shot LLM-as-Enhancer designs, RoGRAD is the first iterative paradigm that leverages Retrieval-Augmented Generation (RAG) to inject retrieval-grounded augmentations by supplying class-consistent, diverse augmentations and enforcing discriminative representations through iterative graph contrastive learning. It transforms LLM augmentation for graphs from static signal injection into dynamic refinement. Extensive experiments demonstrate RoGRAD's superiority over both conventional GNN- and LLM-enhanced baselines, achieving up to 82.43% average improvement.

Paper Structure

This paper contains 36 sections, 8 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Accuracy under increasing attack intensities.
  • Figure 2: GCN Performance under compound deficiencies.
  • Figure 3: Overall architecture of RoGRAD. RoGRAD establishes the first iterative RAG+GCL paradigm for LLM-on-graph, replacing static one-shot augmentation with dynamic multi-round refinement.
  • Figure 4: Prompts for semantic-guided generation.
  • Figure 5: Prompts for retrieval-refined contrastive learning.
  • ...and 6 more figures