Table of Contents
Fetching ...

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Vikas Yadav, Zheng Tang, Vijay Srinivasan

TL;DR

Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query, performs intent classification for the original query and each paraphrase, and at the end aggregate all the predicted intent labels based on their confidence scores is introduced.

Abstract

Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

TL;DR

Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query, performs intent classification for the original query and each paraphrase, and at the end aggregate all the predicted intent labels based on their confidence scores is introduced.

Abstract

Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors
Paper Structure (9 sections, 2 figures, 3 tables, 1 algorithm)

This paper contains 9 sections, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Left figure depicting the flow process of PAG-LLM. On the left \ref{['fig:examplefig']}(A), LLM classifies the original query and only if the classification confidence is lower than $\tau$, original query is given to the LLM for generating paraphrases which are then again given to the LLM for classification. Finally, LLM aggregates the predicted class labels from paraphrases and the original query. In the right figure table, examples from CLINC are shown where LLM classifies incorrect label (top example) and out-of-vocabulory (OOV) class label (bottom example). In the top example, paraphrases generated by PAG-LLM enables correct classification decisions with high confidence scores. Thus, even simple majority voting aggregation leads to the correct class prediction. In the bottom example, only paraphrase2 from PAG-LLM enables correct classification while remaining paraphrases and the original query have OOV class labels. PAG-LLM aggregates texts of input, paraphrases, their labels and confidences to finally predict the correct class label.
  • Figure 2: Plot showing portion of inference data and error rate reduction on CLINC with increasing classification threshold ($\tau$).