Text Classification via Large Language Models

Xiaofei Sun; Xiaoya Li; Jiwei Li; Fei Wu; Shangwei Guo; Tianwei Zhang; Guoyin Wang

Text Classification via Large Language Models

Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang, Guoyin Wang

TL;DR

CARP addresses the gap where in-context learning with LLMs underperforms fine-tuned models on text classification due to limited reasoning and token budget. It introduces a progressive clue-and-reasoning prompting framework that first extracts superficial clues, then induces diagnostic reasoning before final decision, and it leverages a kNN-based demonstration retrieval guided by a fine-tuned model to compensate for context limits. The approach achieves state-of-the-art results on four of five standard benchmarks and demonstrates strong performance in low-resource and domain-adaptation settings, highlighting practical data efficiency and robustness. Overall, CARP shows that structured, explainable prompting combined with task-aligned retrieval can significantly boost LLM-based text classification.

Abstract

Despite the remarkable success of large-scale Language Models (LLMs) such as GPT-3, their performances still significantly underperform fine-tuned models in the task of text classification. This is due to (1) the lack of reasoning ability in addressing complex linguistic phenomena (e.g., intensification, contrast, irony etc); (2) limited number of tokens allowed in in-context learning. In this paper, we introduce Clue And Reasoning Prompting (CARP). CARP adopts a progressive reasoning strategy tailored to addressing the complex linguistic phenomena involved in text classification: CARP first prompts LLMs to find superficial clues (e.g., keywords, tones, semantic relations, references, etc), based on which a diagnostic reasoning process is induced for final decisions. To further address the limited-token issue, CARP uses a fine-tuned model on the supervised dataset for $k$NN demonstration search in the in-context learning, allowing the model to take the advantage of both LLM's generalization ability and the task-specific evidence provided by the full labeled dataset. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups. Specifically, using 16 examples per class, CARP achieves comparable performances to supervised models with 1,024 examples per class.

Text Classification via Large Language Models

TL;DR

Abstract

NN demonstration search in the in-context learning, allowing the model to take the advantage of both LLM's generalization ability and the task-specific evidence provided by the full labeled dataset. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups. Specifically, using 16 examples per class, CARP achieves comparable performances to supervised models with 1,024 examples per class.

Paper Structure (45 sections, 2 equations, 4 figures, 16 tables)

This paper contains 45 sections, 2 equations, 4 figures, 16 tables.

Introduction
Related Work
Large Language Models
In-context Learning
Text Classification
Prompt Construction
Overview
Prompt Construction
(1) Task description $\bm{x}_\textit{desc}$
(2) Demonstration
(3) Input $\bm{x_{\textit{input}}}$
Demonstration Sampling
Random Sampling
SimCSE
Finetuned Model
...and 30 more sections

Figures (4)

Figure 1: Examples of zero-shot prompting methods for the text classification task: (a) represents for the vanilla prompting method; (b) denotes for the Chain-of-Thought (CoT)Kojima2022LargeLM prompting method; c represents for the proposed CARP prompting method.
Figure 2: Examples of few-shot ($k$=1) prompting methods for the text classification task: (a) represents for the vanilla prompting method; (b) denotes for the Chain-of-Thought (CoT)Kojima2022LargeLM prompting method; (c) represents for the proposed CARP prompting method.
Figure 3: Performances v.s. the number of demonstrations in few-shot prompts.
Figure 4: Performances v.s. the number of demonstrations in few-shot prompts for the CARP strategy, where LLMs are first asked to generate evidence, then to reason and at last to generate final results.

Text Classification via Large Language Models

TL;DR

Abstract

Text Classification via Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)