GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs
Liangwei Yang, Jing Ma, Jianguo Zhang, Zhiwei Liu, Jielin Qiu, Shirley Kokane, Shiyu Wang, Haolin Chen, Rithesh Murthy, Ming Zhu, Huan Wang, Weiran Yao, Caiming Xiong, Shelby Heinecke
TL;DR
This work identifies semantic drift as a key limitation when applying traditional linear neighborhood aggregation to PLM-derived embeddings in text-attributed graphs. It introduces a local PCA-based metric to quantify drift and proposes Geodesic Aggregation, which operates along geodesics on the semantic manifold via log–exp maps on the unit sphere, forming the GeoGNN model with spherical attention. Across four CS-TAG datasets and diverse encoders, GeoGNN consistently outperforms strong baselines and exhibits reduced manifold distortion, validating the value of geometry-aware message passing. The findings highlight the importance of respecting the intrinsic geometry of textual representations for robust and scalable text–graph learning.
Abstract
Graph neural networks (GNNs) on text--attributed graphs (TAGs) typically encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation. However, the representation spaces of modern PLMs are highly non--linear and geometrically structured, where textual embeddings reside on curved semantic manifolds rather than flat Euclidean spaces. Linear aggregation on such manifolds inevitably distorts geometry and causes semantic drift--a phenomenon where aggregated representations deviate from the intrinsic manifold, losing semantic fidelity and expressive power. To quantitatively investigate this problem, this work introduces a local PCA--based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure. Building upon these insights, we propose Geodesic Aggregation, a manifold--aware mechanism that aggregates neighbor information along geodesics via log--exp mappings on the unit sphere, ensuring that representations remain faithful to the semantic manifold during message passing. We further develop GeoGNN, a practical instantiation that integrates spherical attention with manifold interpolation. Extensive experiments across four benchmark datasets and multiple text encoders show that GeoGNN substantially mitigates semantic drift and consistently outperforms strong baselines, establishing the importance of manifold--aware aggregation in text--attributed graph learning.
