Table of Contents
Fetching ...

Prompt Perturbation Consistency Learning for Robust Language Models

Yao Qiang, Subhrangshu Nandi, Ninareh Mehrabi, Greg Ver Steeg, Anoop Kumar, Anna Rumshisky, Aram Galstyan

TL;DR

It is shown that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models, and an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples.

Abstract

Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermore, there is a lack of substantive research on the robustness of LLMs to various perturbations in the input prompts. The contributions of this paper are three-fold. First, we show that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models. Next, we systematically analyze the performance deterioration of those fine-tuned models due to three distinct yet relevant types of input perturbations - oronyms, synonyms, and paraphrasing. Finally, we propose an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples. Our experiments demonstrate that PPCL can recover on average 59% and 69% of the performance drop for IC and SF tasks, respectively. Furthermore, PPCL beats the data augmentation approach while using ten times fewer augmented data samples.

Prompt Perturbation Consistency Learning for Robust Language Models

TL;DR

It is shown that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models, and an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples.

Abstract

Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermore, there is a lack of substantive research on the robustness of LLMs to various perturbations in the input prompts. The contributions of this paper are three-fold. First, we show that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models. Next, we systematically analyze the performance deterioration of those fine-tuned models due to three distinct yet relevant types of input perturbations - oronyms, synonyms, and paraphrasing. Finally, we propose an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples. Our experiments demonstrate that PPCL can recover on average 59% and 69% of the performance drop for IC and SF tasks, respectively. Furthermore, PPCL beats the data augmentation approach while using ten times fewer augmented data samples.
Paper Structure (32 sections, 7 equations, 2 figures, 12 tables)

This paper contains 32 sections, 7 equations, 2 figures, 12 tables.

Figures (2)

  • Figure 1: Illustration examples. LLMs are expected to generate structured hypotheses, i.e., domain, intent, and slots, in their responses to given user requests. Model prediction (shown in red) changes for minor perturbance.
  • Figure 2: Perturbation consistency learning architecture. $x_c$ and $x_p$ denote the clean and perturbed utterances, respectively. $\hat{y}_c$ and $\hat{y}_p$ here denote the slot labels generated by LLM. $\hat{y}_c^j$ and $\hat{y}_p^j$ represent the output probability distributions of current interest tokens, i.e., 'date' and 'O'. JS here denotes Jensen–Shannon divergence.