Table of Contents
Fetching ...

FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models

Hrishikesh Viswanath, Tianyi Zhang

TL;DR

FairPy provides a general, plug-and-play framework to evaluate and mitigate token prediction biases in large language models, addressing model-architecture diversity and reproducibility. It decouples bias metrics from specific models and offers modular implementations of both detection (e.g., Hellinger Distance, WEAT/SEAT, StereoSet, Honest Score, Log Likelihood) and mitigation (e.g., DiffPruning, Null Space Projection, CDA, Self Debias, Dropout) techniques. The work surveys existing bias measurement approaches, demonstrates a practical, extensible toolkit, and presents empirical analysis highlighting dataset and template dependencies, guiding safer deployment of LMs. Overall, Fairpy aims to standardize bias evaluation in NLP pipelines and accelerate robust debiasing practices across diverse language models and applications.

Abstract

Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction, often inherited from the data distributions present in their training corpora. In response, a number of mathematical frameworks have been proposed to quantify, identify, and mitigate these the likelihood of biased token predictions. In this paper, we present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and extensible toolkit that provides plug-and-play interfaces for integrating these mathematical tools, enabling users to evaluate both pretrained and custom language models. Fairpy supports the implementation of existing debiasing algorithms. The toolkit is open-source and publicly available at: \href{https://github.com/HrishikeshVish/Fairpy}{https://github.com/HrishikeshVish/Fairpy}

FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models

TL;DR

FairPy provides a general, plug-and-play framework to evaluate and mitigate token prediction biases in large language models, addressing model-architecture diversity and reproducibility. It decouples bias metrics from specific models and offers modular implementations of both detection (e.g., Hellinger Distance, WEAT/SEAT, StereoSet, Honest Score, Log Likelihood) and mitigation (e.g., DiffPruning, Null Space Projection, CDA, Self Debias, Dropout) techniques. The work surveys existing bias measurement approaches, demonstrates a practical, extensible toolkit, and presents empirical analysis highlighting dataset and template dependencies, guiding safer deployment of LMs. Overall, Fairpy aims to standardize bias evaluation in NLP pipelines and accelerate robust debiasing practices across diverse language models and applications.

Abstract

Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction, often inherited from the data distributions present in their training corpora. In response, a number of mathematical frameworks have been proposed to quantify, identify, and mitigate these the likelihood of biased token predictions. In this paper, we present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and extensible toolkit that provides plug-and-play interfaces for integrating these mathematical tools, enabling users to evaluate both pretrained and custom language models. Fairpy supports the implementation of existing debiasing algorithms. The toolkit is open-source and publicly available at: \href{https://github.com/HrishikeshVish/Fairpy}{https://github.com/HrishikeshVish/Fairpy}
Paper Structure (17 sections, 11 equations, 1 figure, 2 tables)

This paper contains 17 sections, 11 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: The flow diagram highlights the overall architecture of the model, with the nodes labelled in red denoting the bias detection techniques and the nodes in green highlighting the bias mitigation techniques