Table of Contents
Fetching ...

Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning

Kaifeng Hong, Yinglong Zhang, Xiaoying Hong, Xuewen Xia, Xing Xu

TL;DR

Odin introduces a layer-aligned dual-module fusion that inserts graph aggregation at strategically chosen Transformer depths to fuse text and multi-hop graph structure without the diffusion issues of GNNs. By alternating TG (graph+text) and TS (text+simple-aggregation) layers, Odin achieves depth-consistent, hop-free structural guidance, backed by theoretical oversmoothing avoidance and expressive-power guarantees. Extensive experiments across five TAG datasets and four tasks demonstrate state-of-the-art performance, with Light Odin offering substantial efficiency gains for scalable deployment. The work provides a unified framework for principled structure-text integration and releases its source code for reproducibility.

Abstract

Text-attributed graphs require models to effectively combine strong textual understanding with structurally informed reasoning. Existing approaches either rely on GNNs--limited by over-smoothing and hop-dependent diffusion--or employ Transformers that overlook graph topology and treat nodes as isolated sequences. We propose Odin (Oriented Dual-module INtegration), a new architecture that injects graph structure into Transformers at selected depths through an oriented dual-module mechanism. Unlike message-passing GNNs, Odin does not rely on multi-hop diffusion; instead, multi-hop structures are integrated at specific Transformer layers, yielding low-, mid-, and high-level structural abstraction aligned with the model's semantic hierarchy. Because aggregation operates on the global [CLS] representation, Odin fundamentally avoids over-smoothing and decouples structural abstraction from neighborhood size or graph topology. We further establish that Odin's expressive power strictly contains that of both pure Transformers and GNNs. To make the design efficient in large-scale or low-resource settings, we introduce Light Odin, a lightweight variant that preserves the same layer-aligned structural abstraction for faster training and inference. Experiments on multiple text-rich graph benchmarks show that Odin achieves state-of-the-art accuracy, while Light Odin delivers competitive performance with significantly reduced computational cost. Together, Odin and Light Odin form a unified, hop-free framework for principled structure-text integration. The source code of this model has been released at https://github.com/hongkaifeng/Odin.

Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning

TL;DR

Odin introduces a layer-aligned dual-module fusion that inserts graph aggregation at strategically chosen Transformer depths to fuse text and multi-hop graph structure without the diffusion issues of GNNs. By alternating TG (graph+text) and TS (text+simple-aggregation) layers, Odin achieves depth-consistent, hop-free structural guidance, backed by theoretical oversmoothing avoidance and expressive-power guarantees. Extensive experiments across five TAG datasets and four tasks demonstrate state-of-the-art performance, with Light Odin offering substantial efficiency gains for scalable deployment. The work provides a unified framework for principled structure-text integration and releases its source code for reproducibility.

Abstract

Text-attributed graphs require models to effectively combine strong textual understanding with structurally informed reasoning. Existing approaches either rely on GNNs--limited by over-smoothing and hop-dependent diffusion--or employ Transformers that overlook graph topology and treat nodes as isolated sequences. We propose Odin (Oriented Dual-module INtegration), a new architecture that injects graph structure into Transformers at selected depths through an oriented dual-module mechanism. Unlike message-passing GNNs, Odin does not rely on multi-hop diffusion; instead, multi-hop structures are integrated at specific Transformer layers, yielding low-, mid-, and high-level structural abstraction aligned with the model's semantic hierarchy. Because aggregation operates on the global [CLS] representation, Odin fundamentally avoids over-smoothing and decouples structural abstraction from neighborhood size or graph topology. We further establish that Odin's expressive power strictly contains that of both pure Transformers and GNNs. To make the design efficient in large-scale or low-resource settings, we introduce Light Odin, a lightweight variant that preserves the same layer-aligned structural abstraction for faster training and inference. Experiments on multiple text-rich graph benchmarks show that Odin achieves state-of-the-art accuracy, while Light Odin delivers competitive performance with significantly reduced computational cost. Together, Odin and Light Odin form a unified, hop-free framework for principled structure-text integration. The source code of this model has been released at https://github.com/hongkaifeng/Odin.

Paper Structure

This paper contains 33 sections, 30 equations, 2 figures, 12 tables, 1 algorithm.

Figures (2)

  • Figure 1: An illustration of a text-rich network where multi-hop neighborhood information provides crucial contextual cues for semantic interpretation.
  • Figure 2: Odin Model Structure Diagram