Text Classification using Graph Convolutional Networks: A Comprehensive Survey
Syed Mustafa Haider Rizvi, Ramsha Imran, Arif Mahmood
TL;DR
This survey consolidates the state of text classification with Graph Convolutional Networks, tracing developments from the original TextGCN to modern hybrids that fuse GCNs with BERT and large language models. It systematically categorizes methods by supervision (supervised, semi-, self-, weakly supervised) and architecture (fundamental GCNs vs. GCNs integrated with generative models), and it benchmarks performance on standard datasets to illuminate strengths and limitations. The authors emphasize the shift from purely supervised approaches to semi/self-supervised strategies that leverage unlabeled data, as well as the growing trend of integrating GCNs with transformers and LLMs to capture both global graph structure and rich contextual information. The work offers actionable directions for handling data scarcity, improving graph augmentation and diffusion, and enabling online/inductive learning and cross-lingual transfer, with clear implications for real-world NLP systems and future research agendas.
Abstract
Text classification is a quintessential and practical problem in natural language processing with applications in diverse domains such as sentiment analysis, fake news detection, medical diagnosis, and document classification. A sizable body of recent works exists where researchers have studied and tackled text classification from different angles with varying degrees of success. Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade with many implementations achieving state-of-the-art performance in more recent literature and thus, warranting the need for an updated survey. This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision. It identifies their strengths and limitations and compares their performance on various benchmark datasets. We also discuss future research directions and the challenges that exist in this domain.
