When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach
Zhihan Zhang, Xunkai Li, Yilong Zuo, Zhaoxin Fan, Zhenjun Li, Bing Zhou, Rong-Hua Li, Guoren Wang
TL;DR
This work tackles the data quality bottleneck in text-attributed graphs (TAGs) by introducing LAGA, a unified, automated framework powered by large language models. LAGA casts TAG quality improvement as a closed-loop, data-centric process implemented via four collaborating agents—Detection, Planning, Action, and Evaluation—that detect multi-modal defects, plan adaptive repairs, apply text/structure/label enhancements, and assess improvements. Empirical results across five datasets, nine degradation scenarios, and multiple backbones show that LAGA consistently achieves state-of-the-art performance and robustness, aided by scalability strategies like edge sampling and subgraph partitioning. The approach demonstrates that holistic TAG quality optimization is crucial for reliable graph analytics and lays groundwork for extending to heterogeneous or more complex graph settings.
Abstract
Text-attributed graphs (TAGs) have become a key form of graph-structured data in modern data management and analytics, combining structural relationships with rich textual semantics for diverse applications. However, the effectiveness of analytical models, particularly graph neural networks (GNNs), is highly sensitive to data quality. Our empirical analysis shows that both conventional and LLM-enhanced GNNs degrade notably under textual, structural, and label imperfections, underscoring TAG quality as a key bottleneck for reliable analytics. Existing studies have explored data-level optimization for TAGs, but most focus on specific degradation types and target a single aspect like structure or label, lacking a systematic and comprehensive perspective on data quality improvement. To address this gap, we propose LAGA (Large Language and Graph Agent), a unified multi-agent framework for comprehensive TAG quality optimization. LAGA formulates graph quality control as a data-centric process, integrating detection, planning, action, and evaluation agents into an automated loop. It holistically enhances textual, structural, and label aspects through coordinated multi-modal optimization. Extensive experiments on 5 datasets and 16 baselines across 9 scenarios demonstrate the effectiveness, robustness and scalability of LAGA, confirming the importance of data-centric quality optimization for reliable TAG analytics.
