Table of Contents
Fetching ...

CoT-Driven Framework for Short Text Classification: Enhancing and Transferring Capabilities from Large to Smaller Model

Hui Wu, Yuanben Zhang, Zhonghe Han, Yingyan Hou, Lei Wang, Siye Liu, Qihang Gong, Yunping Ge

TL;DR

This paper tackles short text classification (STC) by leveraging Chain-of-Thought prompts in large language models (LLMs) and transferring those capabilities to smaller models. It introduces SSE-CoT, a four-step reasoning framework that decomposes STC into key concept identification, common-sense retrieval, text rewriting, and final classification, demonstrating state-of-the-art results across six benchmarks, notably on Ohsumed and TagMyNews. To enable practical deployment in resource-constrained settings, it proposes the CoT-Driven Multi-Task learning (CDMT) framework, which distills LLM reasoning into a smaller model via rationale generation and a multi-task objective with explicit category context augmentation (ECCA). Extensive experiments compare against PLMs, GCN-based STC methods, and other LLM baselines, showing robust gains for SSE-CoT and meaningful improvements for CDMT, while also revealing dataset-dependent dynamics and trade-offs in time complexity. The work suggests a practical path to deploy CoT-enhanced STC in real-world systems, combining improved semantic-syntactic understanding with transferable reasoning to smaller models.

Abstract

Short Text Classification (STC) is crucial for processing and understanding the brief but substantial content prevalent on contemporary digital platforms. The STC encounters difficulties in grasping the semantic and syntactic intricacies, an issue that is apparent in traditional pre-trained language models. Although Graph Convolutional Networks enhance performance by integrating external knowledge bases, these methods are limited by the quality and extent of the knowledge applied. Recently, the emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks. However, some studies have highlighted the limitations of their application in fundamental NLP tasks. Consequently, this study first employs CoT to investigate and enhance the capabilities of LLMs in STC tasks. We propose the Syntactic and Semantic Enrichment CoT (SSE-CoT) method, effectively decomposing the STC tasks into four distinct steps: (i) essential concept identification, (ii) common-sense knowledge retrieval, (iii) text rewriting, and (iv) classification. Furthermore, recognizing resource constraints in sectors like finance and healthcare, we then introduce the CoT-Driven Multi-Task Learning (CDMT) framework to extend these capabilities to smaller models. This framework begins by extracting rationales from LLMs and subsequently fine-tunes smaller models to optimize their performance. Extensive experimentation across six short-text benchmarks validated the efficacy of the proposed methods. In particular, SSE-CoT achieved state-of-the-art performance with substantial improvements on all datasets, particularly on the Ohsumed and TagMyNews datasets.

CoT-Driven Framework for Short Text Classification: Enhancing and Transferring Capabilities from Large to Smaller Model

TL;DR

This paper tackles short text classification (STC) by leveraging Chain-of-Thought prompts in large language models (LLMs) and transferring those capabilities to smaller models. It introduces SSE-CoT, a four-step reasoning framework that decomposes STC into key concept identification, common-sense retrieval, text rewriting, and final classification, demonstrating state-of-the-art results across six benchmarks, notably on Ohsumed and TagMyNews. To enable practical deployment in resource-constrained settings, it proposes the CoT-Driven Multi-Task learning (CDMT) framework, which distills LLM reasoning into a smaller model via rationale generation and a multi-task objective with explicit category context augmentation (ECCA). Extensive experiments compare against PLMs, GCN-based STC methods, and other LLM baselines, showing robust gains for SSE-CoT and meaningful improvements for CDMT, while also revealing dataset-dependent dynamics and trade-offs in time complexity. The work suggests a practical path to deploy CoT-enhanced STC in real-world systems, combining improved semantic-syntactic understanding with transferable reasoning to smaller models.

Abstract

Short Text Classification (STC) is crucial for processing and understanding the brief but substantial content prevalent on contemporary digital platforms. The STC encounters difficulties in grasping the semantic and syntactic intricacies, an issue that is apparent in traditional pre-trained language models. Although Graph Convolutional Networks enhance performance by integrating external knowledge bases, these methods are limited by the quality and extent of the knowledge applied. Recently, the emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks. However, some studies have highlighted the limitations of their application in fundamental NLP tasks. Consequently, this study first employs CoT to investigate and enhance the capabilities of LLMs in STC tasks. We propose the Syntactic and Semantic Enrichment CoT (SSE-CoT) method, effectively decomposing the STC tasks into four distinct steps: (i) essential concept identification, (ii) common-sense knowledge retrieval, (iii) text rewriting, and (iv) classification. Furthermore, recognizing resource constraints in sectors like finance and healthcare, we then introduce the CoT-Driven Multi-Task Learning (CDMT) framework to extend these capabilities to smaller models. This framework begins by extracting rationales from LLMs and subsequently fine-tunes smaller models to optimize their performance. Extensive experimentation across six short-text benchmarks validated the efficacy of the proposed methods. In particular, SSE-CoT achieved state-of-the-art performance with substantial improvements on all datasets, particularly on the Ohsumed and TagMyNews datasets.
Paper Structure (31 sections, 12 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 31 sections, 12 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: This diagram compares two approaches to the STC tasks. Due to misinterpretation, the traditional approach erroneously classifies the input 'Del Potro says make French Open' as 'world'. Conversely, our CoT method employs a sequential analytical process that correctly identifies 'Del Potro' as a tennis player, recognizes the 'French Open' as a tennis tournament, and detects the absence of a grammatical object in the sentence, resulting in accurate categorization under 'sport'.
  • Figure 2: This diagram presents the Semantic and Syntactic Enrichment CoT (SSE-CoT), as applied to the short text 'Del Potro says make French Open'. It begins by identifying key concepts, 'Del Potro' and the 'French Open', then combines them to contextualize 'Del Potro' as a tennis player and the 'French Open' as a major tournament. The third step refines this information for accuracy and integration. Finally, the process classifies the outcome under 'sport'. The framework offers a novel solution that effectively addresses STC tasks challenges.
  • Figure 3: Overview of the CDMT method. In the first stage, the framework employs SSE-CoT and DA-CoT to prompt LLM with training data for rationale generation. In the second stage, the generated rationales guide the training of a smaller, specialized model. This stage involves multi-task fine-tuning that incorporates a supervised signal, which includes a label and two distinct rationales derived from SSE-CoT and DA-CoT reasoning.
  • Figure 4: The figure depicts the two-phase procedure of the DA-CoT method employed in the snippets domain. Initially, the method discerns essential text elements, including primary entities, actions, and events. Subsequently, it synthesizes the interconnections and collective importance of these elements, enhancing comprehension of their pertinence and consequences in the context of the text.
  • Figure 5: Performance evaluation of LLMs in zero-shot and one-shot settings is conducted using three representative datasets. The upper three groups correspond to zero-shot settings, while the lower three pertain to one-shot settings. In each figure, the best results are highlighted in bold.
  • ...and 7 more figures