Towards Debiasing Sentence Representations

Paul Pu Liang; Irene Mengze Li; Emily Zheng; Yao Chong Lim; Ruslan Salakhutdinov; Louis-Philippe Morency

Towards Debiasing Sentence Representations

Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhutdinov, Louis-Philippe Morency

TL;DR

The paper tackles the challenge of social biases in sentence representations by introducing Sent-Debias, a post-hoc debiasing method that estimates a bias subspace from contextually generated bias sentences and removes bias projections from sentence encodings. By leveraging diverse natural-language templates and PCA-based subspace estimation, Sent-Debias demonstrates substantial bias reductions for both binary gender and multiclass religious attributes in BERT and ELMo, while largely preserving downstream task performance. The work also shows that template diversity and cross-domain coverage are crucial for reliable debiasing and provides qualitative visualization to illustrate bias mitigation. While acknowledging limitations of current bias metrics and ethical considerations, the paper offers a practical, scalable approach to fairer sentence representations and outlines avenues for future research in bias characterization and mitigation. Overall, Sent-Debias advances post-hoc, sentence-level debiasing as a viable complement to training-time fairness methods in modern NLP systems.

Abstract

As natural language processing methods are increasingly deployed in real-world scenarios such as healthcare, legal systems, and social science, it becomes necessary to recognize the role they potentially play in shaping social biases and stereotypes. Previous work has revealed the presence of social biases in widely used word embeddings involving gender, race, religion, and other social constructs. While some methods were proposed to debias these word-level embeddings, there is a need to perform debiasing at the sentence-level given the recent shift towards new contextualized sentence representations such as ELMo and BERT. In this paper, we investigate the presence of social biases in sentence-level representations and propose a new method, Sent-Debias, to reduce these biases. We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks such as sentiment analysis, linguistic acceptability, and natural language understanding. We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.

Towards Debiasing Sentence Representations

TL;DR

Abstract

Towards Debiasing Sentence Representations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)