No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language Models
Charaka Vinayak Kumar, Ashok Urlana, Gopichand Kanumolu, Bala Mallikarjunarao Garlapati, Pruthwik Mishra
TL;DR
This work tackles the problem of bias in large language models by proposing a unified evaluation framework that spans multiple bias facets and datasets. It introduces five prompting-based bias detection methods and evaluates four representative open-source LLMs across six benchmarks, focusing on metrics such as LMS, SS, ICAT, and gender/race-specific scores. The key findings show that all analyzed LLMs exhibit some form of bias, with Phi-3.5B generally the least biased and LL-8B often more biased, while bias patterns vary by bias type and contextual setup. The study highlights significant challenges, including the need for standardized metrics, broader bias coverage, and explainability, and outlines future directions for robust bias detection and mitigation in LLMs.
Abstract
Advancements in Large Language Models (LLMs) have increased the performance of different natural language understanding as well as generation tasks. Although LLMs have breached the state-of-the-art performance in various tasks, they often reflect different forms of bias present in the training data. In the light of this perceived limitation, we provide a unified evaluation of benchmarks using a set of representative small and medium-sized LLMs that cover different forms of biases starting from physical characteristics to socio-economic categories. Moreover, we propose five prompting approaches to carry out the bias detection task across different aspects of bias. Further, we formulate three research questions to gain valuable insight in detecting biases in LLMs using different approaches and evaluation metrics across benchmarks. The results indicate that each of the selected LLMs suffer from one or the other form of bias with the Phi-3.5B model being the least biased. Finally, we conclude the paper with the identification of key challenges and possible future directions.
