Table of Contents
Fetching ...

Soft-prompt Tuning for Large Language Models to Evaluate Bias

Jacob-Junqi Tian, David Emerson, Sevil Zanjani Miyandoab, Deval Pandya, Laleh Seyyed-Kalantari, Faiza Khan Khattak

TL;DR

The use of soft-prompt tuning on sentiment classification task to quantify the biases of large language models such as Open Pre-trained Transformers (OPT) and Galactica language model and finds interesting bias patterns.

Abstract

Prompting large language models has gained immense popularity in recent years due to the advantage of producing good results even without the need for labelled data. However, this requires prompt tuning to get optimal prompts that lead to better model performances. In this paper, we explore the use of soft-prompt tuning on sentiment classification task to quantify the biases of large language models (LLMs) such as Open Pre-trained Transformers (OPT) and Galactica language model. Since these models are trained on real-world data that could be prone to bias toward certain groups of populations, it is important to identify these underlying issues. Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection that can be caused by manually designed prompts. We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns. Since LLMs have been used in the industry in various applications, it is crucial to identify the biases before deploying these models in practice. We open-source our pipeline and encourage industry researchers to adapt our work to their use cases.

Soft-prompt Tuning for Large Language Models to Evaluate Bias

TL;DR

The use of soft-prompt tuning on sentiment classification task to quantify the biases of large language models such as Open Pre-trained Transformers (OPT) and Galactica language model and finds interesting bias patterns.

Abstract

Prompting large language models has gained immense popularity in recent years due to the advantage of producing good results even without the need for labelled data. However, this requires prompt tuning to get optimal prompts that lead to better model performances. In this paper, we explore the use of soft-prompt tuning on sentiment classification task to quantify the biases of large language models (LLMs) such as Open Pre-trained Transformers (OPT) and Galactica language model. Since these models are trained on real-world data that could be prone to bias toward certain groups of populations, it is important to identify these underlying issues. Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection that can be caused by manually designed prompts. We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns. Since LLMs have been used in the industry in various applications, it is crucial to identify the biases before deploying these models in practice. We open-source our pipeline and encourage industry researchers to adapt our work to their use cases.
Paper Structure (18 sections, 1 equation, 7 figures, 3 tables)

This paper contains 18 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of the prompt-tuning approach used for parameter efficient fine-tuning of the models. The prompt tokens, depicted with orange hatching, are initialized as the beginning-of-sequence token embedding. These embeddings are subsequently perturbed by adding learned prompt embeddings. All weights are frozen during back-propagation except for the prompt embedding layer.
  • Figure 2: Positive FPR gap for the sensitive attribute of sexuality. Markers indicate average gap and bars are $95$% confidence intervals. A positive gap indicates model errors that favor a group over others. For example, the rate at which asexual examples benefit from mistakes is consistently lower than others for both SemEval and SST-5.
  • Figure 3: Negative FPR gap for the sensitive attribute sexuality. Markers indicate the average gap and bars are $95$% confidence intervals. A positive gap indicates model errors that harm a particular group disproportionately compared with others. Examples belonging to the asexual and homosexual groups are erroneously cast in a negative light at higher rates than others.
  • Figure 4: Positive FPR gap for the sensitive attribute of age. Markers indicate the average gap and bars are $95$% confidence intervals. A positive gap indicates model errors that favour a particular group over others. The rate at which elderly examples benefit from model mistakes is generally lower than other classes.
  • Figure 5: Negative FPR gap for the sensitive attribute of age. Markers indicate average gap and bars are $95$% confidence intervals. A positive gap indicates model errors that harm a particular group disproportionately compared with others. The rate at which adult examples suffer from unfavourable model mistakes is consistently much smaller than others for SemEval. This conclusion is not as clear for SST-5.
  • ...and 2 more figures