TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations
Zheng Zhang, Yuntong Hu, Bo Pan, Chen Ling, Liang Zhao
TL;DR
TAGA tackles unsupervised representation learning for Text-Attributed Graphs by synergizing textual and structural information through two mutually informative views: Text-of-Graph (TofG) and Graph-of-Text (GoT). A Graph2Text module converts neighborhood structures into hierarchical documents while HDL preserves graph topology in text, and a GNN processes the Graph-of-Text view; these views are aligned with a hierarchical self-supervised loss $L = L_{positive} + L_{negative}$ to capture joint semantics. To scale to large TAGs, TAGA introduces a structure-preserving random walk that mimics human reading and reduces the computational burden of processing long text, enabling efficient training and inference. Empirically, TAGA achieves strong zero-shot and few-shot performance across eight real-world datasets, with substantial improvements over both graph-only pre-training and PLM baselines, and demonstrates robust transferability between domains. The combination of dual-view alignment, HDL, and efficient training makes TAGA a strong framework for universal TAG representations with practical, scalable utility.
Abstract
Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions, enabling detailed representation of data and their relationships across a broad spectrum of real-world scenarios. Despite the potential for deeper insights, existing TAG representation learning primarily relies on supervised methods, necessitating extensive labeled data and limiting applicability across diverse contexts. This paper introduces a new self-supervised learning framework, Text-And-Graph Multi-View Alignment (TAGA), which overcomes these constraints by integrating TAGs' structural and semantic dimensions. TAGA constructs two complementary views: Text-of-Graph view, which organizes node texts into structured documents based on graph topology, and the Graph-of-Text view, which converts textual nodes and connections into graph data. By aligning representations from both views, TAGA captures joint textual and structural information. In addition, a novel structure-preserving random walk algorithm is proposed for efficient training on large-sized TAGs. Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.
