Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation

Fatima Kazi

Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation

Fatima Kazi

TL;DR

The work addresses how Large Language Models inherit social biases from training data and proposes a bias-evaluation framework using StereoSet and CrowSPairs. It combines implicit and explicit bias prompting with data augmentation and cross-dataset testing across BERT, DistilBERT, GPT-3.5, and T5 to quantify and mitigate stereotypes. Key findings show that fine-tuning with augmented data can reduce implicit bias and improve cross-dataset performance, but gender bias remains challenging and benchmark limitations persist. The study introduces a Bias-Identification Framework to standardize bias assessment and highlights avenues for future research, including multimodal biases and more robust debiasing strategies.

Abstract

Large Language models (LLMs), such as ChatGPT, have gained popularity in recent years with the advancement of Natural Language Processing (NLP), with use cases spanning many disciplines and daily lives as well. LLMs inherit explicit and implicit biases from the datasets they were trained on; these biases can include social, ethical, cultural, religious, and other prejudices and stereotypes. It is important to comprehensively examine such shortcomings by identifying the existence and extent of such biases, recognizing the origin, and attempting to mitigate such biased outputs to ensure fair outputs to reduce harmful stereotypes and misinformation. This study inspects and highlights the need to address biases in LLMs amid growing generative Artificial Intelligence (AI). We utilize bias-specific benchmarks such StereoSet and CrowSPairs to evaluate the existence of various biases in many different generative models such as BERT, GPT 3.5, and ADA. To detect both explicit and implicit biases, we adopt a three-pronged approach for thorough and inclusive analysis. Results indicate fine-tuned models struggle with gender biases but excel at identifying and avoiding racial biases. Our findings also illustrated that despite some cases of success, LLMs often over-rely on keywords in prompts and its outputs. This demonstrates the incapability of LLMs to attempt to truly understand the accuracy and authenticity of its outputs. Finally, in an attempt to bolster model performance, we applied an enhancement learning strategy involving fine-tuning, models using different prompting techniques, and data augmentation of the bias benchmarks. We found fine-tuned models to exhibit promising adaptability during cross-dataset testing and significantly enhanced performance on implicit bias benchmarks, with performance gains of up to 20%.

Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation

TL;DR

Abstract

Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)