ADEPT: A DEbiasing PrompT Framework

Ke Yang; Charles Yu; Yi Fung; Manling Li; Heng Ji

ADEPT: A DEbiasing PrompT Framework

Ke Yang, Charles Yu, Yi Fung, Manling Li, Heng Ji

TL;DR

ADEPT addresses the challenge of debiasing large pretrained language models without sacrificing their representational capabilities. It uses prompt tuning with a manifold-learning-inspired objective, combining a debiasing term $L_{bias}$ with a representation-preservation term $L_{representation}$, and builds attribute prototypes from contextualized embeddings. The method achieves competitive bias mitigation on SEAT, CrowS-Pairs, and StereoSet while preserving or improving GLUE task performance, and demonstrates substantial parameter efficiency by training only a small prompt (around 1.97M parameters). Visualizations of word correlations suggest ADEPT preserves relative distances among neutral words and brings attribute words closer on the manifold, offering a geometrically interpretable view of debiasing effects. Overall, ADEPT provides a practical and effective framework for debiasing contextualized word representations with low computational overhead and clear interpretability through prototype medication on a manifold.

Abstract

Several works have proven that finetuning is an applicable approach for debiasing contextualized word embeddings. Similarly, discrete prompts with semantic meanings have shown to be effective in debiasing tasks. With unfixed mathematical representation at the token level, continuous prompts usually surpass discrete ones at providing a pre-trained language model (PLM) with additional task-specific information. Despite this, relatively few efforts have been made to debias PLMs by prompt tuning with continuous prompts compared to its discrete counterpart. Furthermore, for most debiasing methods that alter a PLM's original parameters, a major problem is the need to not only decrease the bias in the PLM but also to ensure that the PLM does not lose its representation ability. Finetuning methods typically have a hard time maintaining this balance, as they tend to violently remove meanings of attribute words. In this paper, we propose ADEPT, a method to debias PLMs using prompt tuning while maintaining the delicate balance between removing biases and ensuring representation ability. To achieve this, we propose a new training criterion inspired by manifold learning and equip it with an explicit debiasing term to optimize prompt tuning. In addition, we conduct several experiments with regard to the reliability, quality, and quantity of a previously proposed attribute training corpus in order to obtain a clearer prototype of a certain attribute, which indicates the attribute's position and relative distances to other words on the manifold. We evaluate ADEPT on several widely acknowledged debiasing benchmarks and downstream tasks, and find that it achieves competitive results while maintaining (and in some cases even improving) the PLM's representation ability. We further visualize words' correlation before and after debiasing a PLM, and give some possible explanations for the visible effects.

ADEPT: A DEbiasing PrompT Framework

TL;DR

with a representation-preservation term

, and builds attribute prototypes from contextualized embeddings. The method achieves competitive bias mitigation on SEAT, CrowS-Pairs, and StereoSet while preserving or improving GLUE task performance, and demonstrates substantial parameter efficiency by training only a small prompt (around 1.97M parameters). Visualizations of word correlations suggest ADEPT preserves relative distances among neutral words and brings attribute words closer on the manifold, offering a geometrically interpretable view of debiasing effects. Overall, ADEPT provides a practical and effective framework for debiasing contextualized word representations with low computational overhead and clear interpretability through prototype medication on a manifold.

Abstract

Paper Structure (35 sections, 8 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 35 sections, 8 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Debiasing Methods
Word Embeddings
Discrete Prompts
Finetuning Setting
Prompt Tuning
Manifold Learning
Methodology
Define Word Tuples and Collect Sentences
Calculate Prototypes of Neutral Words/Attributes
Define Tuning Loss
Improve Prototypes of Attributes
Reliability
Quality
...and 20 more sections

Figures (2)

Figure 1: An illustration of how debiasing works using ADEPT and for downstream tasks.
Figure 2: Visualized correlation of words in the gender domain. We use t-SNE to plot the figures and set perplexity as 30. We color neutral words beige, male words blue, and female words red.

ADEPT: A DEbiasing PrompT Framework

TL;DR

Abstract

ADEPT: A DEbiasing PrompT Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (2)