CoT-Driven Framework for Short Text Classification: Enhancing and Transferring Capabilities from Large to Smaller Model
Hui Wu, Yuanben Zhang, Zhonghe Han, Yingyan Hou, Lei Wang, Siye Liu, Qihang Gong, Yunping Ge
TL;DR
This paper tackles short text classification (STC) by leveraging Chain-of-Thought prompts in large language models (LLMs) and transferring those capabilities to smaller models. It introduces SSE-CoT, a four-step reasoning framework that decomposes STC into key concept identification, common-sense retrieval, text rewriting, and final classification, demonstrating state-of-the-art results across six benchmarks, notably on Ohsumed and TagMyNews. To enable practical deployment in resource-constrained settings, it proposes the CoT-Driven Multi-Task learning (CDMT) framework, which distills LLM reasoning into a smaller model via rationale generation and a multi-task objective with explicit category context augmentation (ECCA). Extensive experiments compare against PLMs, GCN-based STC methods, and other LLM baselines, showing robust gains for SSE-CoT and meaningful improvements for CDMT, while also revealing dataset-dependent dynamics and trade-offs in time complexity. The work suggests a practical path to deploy CoT-enhanced STC in real-world systems, combining improved semantic-syntactic understanding with transferable reasoning to smaller models.
Abstract
Short Text Classification (STC) is crucial for processing and understanding the brief but substantial content prevalent on contemporary digital platforms. The STC encounters difficulties in grasping the semantic and syntactic intricacies, an issue that is apparent in traditional pre-trained language models. Although Graph Convolutional Networks enhance performance by integrating external knowledge bases, these methods are limited by the quality and extent of the knowledge applied. Recently, the emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks. However, some studies have highlighted the limitations of their application in fundamental NLP tasks. Consequently, this study first employs CoT to investigate and enhance the capabilities of LLMs in STC tasks. We propose the Syntactic and Semantic Enrichment CoT (SSE-CoT) method, effectively decomposing the STC tasks into four distinct steps: (i) essential concept identification, (ii) common-sense knowledge retrieval, (iii) text rewriting, and (iv) classification. Furthermore, recognizing resource constraints in sectors like finance and healthcare, we then introduce the CoT-Driven Multi-Task Learning (CDMT) framework to extend these capabilities to smaller models. This framework begins by extracting rationales from LLMs and subsequently fine-tunes smaller models to optimize their performance. Extensive experimentation across six short-text benchmarks validated the efficacy of the proposed methods. In particular, SSE-CoT achieved state-of-the-art performance with substantial improvements on all datasets, particularly on the Ohsumed and TagMyNews datasets.
