Adaptable and Reliable Text Classification using Large Language Models

Zhiqiang Wang; Yiran Pang; Yanbin Lin; Xingquan Zhu

Adaptable and Reliable Text Classification using Large Language Models

Zhiqiang Wang, Yiran Pang, Yanbin Lin, Xingquan Zhu

TL;DR

This work addresses the challenge of adaptable and reliable text classification across diverse domains by leveraging Large Language Models (LLMs) as the core classifier. It introduces a streamlined, domain-agnostic pipeline that supports zero-shot, few-shot prompting or fine-tuning, with optional domain knowledge and an evaluation subsystem, plus a novel Uncertainty/Error Rate ($U/E$) metric alongside standard $ACC$ and $F1$. Across four varied datasets, LLMs often outperform traditional ML and NN baselines, with fine-tuned LLMs (notably Qwen-7B(F)) achieving top performance and dramatically reducing unreliability. The findings suggest LLM-based classification can be deployed with reduced preprocessing and domain expertise, offering practical benefits for small businesses and broad NLP applications, while also highlighting limitations related to outputs, accessibility, and compute requirements.

Abstract

Text classification is fundamental in Natural Language Processing (NLP), and the advent of Large Language Models (LLMs) has revolutionized the field. This paper introduces an adaptable and reliable text classification paradigm, which leverages LLMs as the core component to address text classification tasks. Our system simplifies the traditional text classification workflows, reducing the need for extensive preprocessing and domain-specific expertise to deliver adaptable and reliable text classification results. We evaluated the performance of several LLMs, machine learning algorithms, and neural network-based architectures on four diverse datasets. Results demonstrate that certain LLMs surpass traditional methods in sentiment analysis, spam SMS detection, and multi-label classification. Furthermore, it is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies, making the fine-tuned model the top performer across all datasets. Source code and datasets are available in this GitHub repository: https://github.com/yeyimilk/llm-zero-shot-classifiers.

Adaptable and Reliable Text Classification using Large Language Models

TL;DR

) metric alongside standard

and

. Across four varied datasets, LLMs often outperform traditional ML and NN baselines, with fine-tuned LLMs (notably Qwen-7B(F)) achieving top performance and dramatically reducing unreliability. The findings suggest LLM-based classification can be deployed with reduced preprocessing and domain expertise, offering practical benefits for small businesses and broad NLP applications, while also highlighting limitations related to outputs, accessibility, and compute requirements.

Abstract

Paper Structure (24 sections, 3 equations, 3 figures, 8 tables)

This paper contains 24 sections, 3 equations, 3 figures, 8 tables.

Introduction
Background and Related Work
Traditional Text Classification Approaches
Deep Learning Approaches
LLM Approaches
Methodology
Adaptable and Reliable System
Evaluation metrics
Accuracy
F1 Score
U/E Rate
Dataset
COVID-19-related Tweets Dataset
Economic Texts Dataset
E-commerce Texts Dataset
...and 9 more sections

Figures (3)

Figure 1: Traditional text classification flow
Figure 2: LLMs' zero-shot text classification simple flow
Figure 3: Framework of our adaptable and reliable text classification system. The steps of the framework can be included as (1) collect data from the data source to establish the domain database; (2) send domain-specific data to the pre-trained LLM model, like GPT-4, Llama-3 and so on; (3) using a few domain-specific data to do fine-tuning or instruction tuning (4) apply the fine-tuning or instruction tuning to the pre-trained LLM model; (5) (optional) utilize domain knowledge to set up the prompts to elevate LLM performance; (6) apply prompts in the pre-trained model; (7) evaluate the whole system's performance; (8) non-expert users query tasks through user interface to the system; (Tasks may include classification, sentiment analysis, prediction, recommendation and so on. In this paper, we take the multi-class classification and sentiment analysis as examples.) (9) LLM API interacts with User interface and the pre-trained LLM model, advising on the user interface.

Adaptable and Reliable Text Classification using Large Language Models

TL;DR

Abstract

Adaptable and Reliable Text Classification using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)