Table of Contents
Fetching ...

Text Classification via Large Language Models

Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang, Guoyin Wang

TL;DR

CARP addresses the gap where in-context learning with LLMs underperforms fine-tuned models on text classification due to limited reasoning and token budget. It introduces a progressive clue-and-reasoning prompting framework that first extracts superficial clues, then induces diagnostic reasoning before final decision, and it leverages a kNN-based demonstration retrieval guided by a fine-tuned model to compensate for context limits. The approach achieves state-of-the-art results on four of five standard benchmarks and demonstrates strong performance in low-resource and domain-adaptation settings, highlighting practical data efficiency and robustness. Overall, CARP shows that structured, explainable prompting combined with task-aligned retrieval can significantly boost LLM-based text classification.

Abstract

Despite the remarkable success of large-scale Language Models (LLMs) such as GPT-3, their performances still significantly underperform fine-tuned models in the task of text classification. This is due to (1) the lack of reasoning ability in addressing complex linguistic phenomena (e.g., intensification, contrast, irony etc); (2) limited number of tokens allowed in in-context learning. In this paper, we introduce Clue And Reasoning Prompting (CARP). CARP adopts a progressive reasoning strategy tailored to addressing the complex linguistic phenomena involved in text classification: CARP first prompts LLMs to find superficial clues (e.g., keywords, tones, semantic relations, references, etc), based on which a diagnostic reasoning process is induced for final decisions. To further address the limited-token issue, CARP uses a fine-tuned model on the supervised dataset for $k$NN demonstration search in the in-context learning, allowing the model to take the advantage of both LLM's generalization ability and the task-specific evidence provided by the full labeled dataset. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups. Specifically, using 16 examples per class, CARP achieves comparable performances to supervised models with 1,024 examples per class.

Text Classification via Large Language Models

TL;DR

CARP addresses the gap where in-context learning with LLMs underperforms fine-tuned models on text classification due to limited reasoning and token budget. It introduces a progressive clue-and-reasoning prompting framework that first extracts superficial clues, then induces diagnostic reasoning before final decision, and it leverages a kNN-based demonstration retrieval guided by a fine-tuned model to compensate for context limits. The approach achieves state-of-the-art results on four of five standard benchmarks and demonstrates strong performance in low-resource and domain-adaptation settings, highlighting practical data efficiency and robustness. Overall, CARP shows that structured, explainable prompting combined with task-aligned retrieval can significantly boost LLM-based text classification.

Abstract

Despite the remarkable success of large-scale Language Models (LLMs) such as GPT-3, their performances still significantly underperform fine-tuned models in the task of text classification. This is due to (1) the lack of reasoning ability in addressing complex linguistic phenomena (e.g., intensification, contrast, irony etc); (2) limited number of tokens allowed in in-context learning. In this paper, we introduce Clue And Reasoning Prompting (CARP). CARP adopts a progressive reasoning strategy tailored to addressing the complex linguistic phenomena involved in text classification: CARP first prompts LLMs to find superficial clues (e.g., keywords, tones, semantic relations, references, etc), based on which a diagnostic reasoning process is induced for final decisions. To further address the limited-token issue, CARP uses a fine-tuned model on the supervised dataset for NN demonstration search in the in-context learning, allowing the model to take the advantage of both LLM's generalization ability and the task-specific evidence provided by the full labeled dataset. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups. Specifically, using 16 examples per class, CARP achieves comparable performances to supervised models with 1,024 examples per class.
Paper Structure (45 sections, 2 equations, 4 figures, 16 tables)

This paper contains 45 sections, 2 equations, 4 figures, 16 tables.

Figures (4)

  • Figure 1: Examples of zero-shot prompting methods for the text classification task: (a) represents for the vanilla prompting method; (b) denotes for the Chain-of-Thought (CoT)Kojima2022LargeLM prompting method; c represents for the proposed CARP prompting method.
  • Figure 2: Examples of few-shot ($k$=1) prompting methods for the text classification task: (a) represents for the vanilla prompting method; (b) denotes for the Chain-of-Thought (CoT)Kojima2022LargeLM prompting method; (c) represents for the proposed CARP prompting method.
  • Figure 3: Performances v.s. the number of demonstrations in few-shot prompts.
  • Figure 4: Performances v.s. the number of demonstrations in few-shot prompts for the CARP strategy, where LLMs are first asked to generate evidence, then to reason and at last to generate final results.