ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

Pengrui Han; Rafal Kocielnik; Adhithya Saravanan; Roy Jiang; Or Sharir; Anima Anandkumar

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

Pengrui Han, Rafal Kocielnik, Adhithya Saravanan, Roy Jiang, Or Sharir, Anima Anandkumar

TL;DR

This work introduces a novel approach utilizing ChatGPT to generate synthetic training data, aiming to enhance the debiasing of LLMs, and proposes two strategies: Targeted Prompting, which provides effective debiasing for known biases but necessitates prior specification of bias in question; and General Prompting, which offers debiasing across various categories.

Abstract

Large Language models (LLMs), while powerful, exhibit harmful social biases. Debiasing is often challenging due to computational costs, data constraints, and potential degradation of multi-task language capabilities. This work introduces a novel approach utilizing ChatGPT to generate synthetic training data, aiming to enhance the debiasing of LLMs. We propose two strategies: Targeted Prompting, which provides effective debiasing for known biases but necessitates prior specification of bias in question; and General Prompting, which, while slightly less effective, offers debiasing across various categories. We leverage resource-efficient LLM debiasing using adapter tuning and compare the effectiveness of our synthetic data to existing debiasing datasets. Our results reveal that: (1) ChatGPT can efficiently produce high-quality training data for debiasing other LLMs; (2) data produced via our approach surpasses existing datasets in debiasing performance while also preserving internal knowledge of a pre-trained LLM; and (3) synthetic data exhibits generalizability across categories, effectively mitigating various biases, including intersectional ones. These findings underscore the potential of synthetic data in advancing the fairness of LLMs with minimal retraining cost.

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

TL;DR

Abstract

Paper Structure (55 sections, 6 figures, 26 tables, 2 algorithms)

This paper contains 55 sections, 6 figures, 26 tables, 2 algorithms.

Introduction
Our Approach
Findings
Contributions
Related Work
Social Bias in Large Language Models (LLMs)
Bias Mitigation Techniques
Synthetic Data Generation with Generative AI
Methodology
Targeted Prompting:
General Prompting:
Loss-Guided Prompting:
Training Methodology:
Experiment
Metrics and Datasets:
...and 40 more sections

Figures (6)

Figure 1: Debiasing performance of different strategies on GPT-2 and BERT averaged across three bias categories and two datsets (StereoSet and CrowS-Pairs).
Figure 2: Average bias score across three bias categories and two metrics for different GPT2 family models before and after synthetic debiasing.
Figure 3: Our debiasing framework using ChatGPT-based synthetic dataset generation and AdapterTuning. The upper part is the process for targeted prompting and the bottom is for general prompting.
Figure 4: The most frequent words generated through each prompting are visualized via word clouds. The larger the word, the more frequently it has been generated.
Figure 5: This graph illustrates a clear trade-off between the model's language capabilities and debiasing performance during training. Lowering bias in a language model is likely to impact its general language proficiency. This represents a fundamental challenge in the field of language model fairness.
...and 1 more figures

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

TL;DR

Abstract

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)