Table of Contents
Fetching ...

Dual Refinement Cycle Learning: Unsupervised Text Classification of Mamba and Community Detection on Text Attributed Graph

Hong Wang, Yinglong Zhang, Hanhan Guo, Xuewen Xia, Xing Xu

TL;DR

This work tackles the challenge of unsupervised text classification on text-attributed graphs where category counts are unknown. It introduces Dual Refinement Cycle Learning (DRCL), a closed-loop framework that jointly optimizes a GCN-based Community Detection Module and a Mamba-based Text Semantic Modeling Module, exchanging pseudo-labels to reinforce structure with semantics and vice versa. A warm-start via Louvain establishes initial proto-community signals, while two forms of proto-signals enable iterative refinement without annotations. Empirical results across six datasets show DRCL consistently improves both community quality and text classification, with Mamba achieving competitive or superior performance using only generated community signals. The approach offers a practical path to scalable, weakly supervised deployment in real-world text–graph systems, and the code is openly available.

Abstract

Pretrained language models offer strong text understanding capabilities but remain difficult to deploy in real-world text-attributed networks due to their heavy dependence on labeled data. Meanwhile, community detection methods typically ignore textual semantics, limiting their usefulness in downstream applications such as content organization, recommendation, and risk monitoring. To overcome these limitations, we present Dual Refinement Cycle Learning (DRCL), a fully unsupervised framework designed for practical scenarios where no labels or category definitions are available. DRCL integrates structural and semantic information through a warm-start initialization and a bidirectional refinement cycle between a GCN-based Community Detection Module (GCN-CDM) and a Text Semantic Modeling Module (TSMM). The two modules iteratively exchange pseudo-labels, allowing semantic cues to enhance structural clustering and structural patterns to guide text representation learning without manual supervision. Across several text-attributed graph datasets, DRCL consistently improves the structural and semantic quality of discovered communities. Moreover, a Mamba-based classifier trained solely from DRCL's community signals achieves accuracy comparable to supervised models, demonstrating its potential for deployment in large-scale systems where labeled data are scarce or costly. The code is available at https://github.com/wuanghoong/DRCL.git.

Dual Refinement Cycle Learning: Unsupervised Text Classification of Mamba and Community Detection on Text Attributed Graph

TL;DR

This work tackles the challenge of unsupervised text classification on text-attributed graphs where category counts are unknown. It introduces Dual Refinement Cycle Learning (DRCL), a closed-loop framework that jointly optimizes a GCN-based Community Detection Module and a Mamba-based Text Semantic Modeling Module, exchanging pseudo-labels to reinforce structure with semantics and vice versa. A warm-start via Louvain establishes initial proto-community signals, while two forms of proto-signals enable iterative refinement without annotations. Empirical results across six datasets show DRCL consistently improves both community quality and text classification, with Mamba achieving competitive or superior performance using only generated community signals. The approach offers a practical path to scalable, weakly supervised deployment in real-world text–graph systems, and the code is openly available.

Abstract

Pretrained language models offer strong text understanding capabilities but remain difficult to deploy in real-world text-attributed networks due to their heavy dependence on labeled data. Meanwhile, community detection methods typically ignore textual semantics, limiting their usefulness in downstream applications such as content organization, recommendation, and risk monitoring. To overcome these limitations, we present Dual Refinement Cycle Learning (DRCL), a fully unsupervised framework designed for practical scenarios where no labels or category definitions are available. DRCL integrates structural and semantic information through a warm-start initialization and a bidirectional refinement cycle between a GCN-based Community Detection Module (GCN-CDM) and a Text Semantic Modeling Module (TSMM). The two modules iteratively exchange pseudo-labels, allowing semantic cues to enhance structural clustering and structural patterns to guide text representation learning without manual supervision. Across several text-attributed graph datasets, DRCL consistently improves the structural and semantic quality of discovered communities. Moreover, a Mamba-based classifier trained solely from DRCL's community signals achieves accuracy comparable to supervised models, demonstrating its potential for deployment in large-scale systems where labeled data are scarce or costly. The code is available at https://github.com/wuanghoong/DRCL.git.

Paper Structure

This paper contains 30 sections, 20 equations, 6 figures, 4 tables, 3 algorithms.

Figures (6)

  • Figure 1: DRCL workflow diagram
  • Figure 2: DRCL Framework
  • Figure 3: Cora
  • Figure 4: Pubmed
  • Figure 6: Cora
  • ...and 1 more figures