Table of Contents
Fetching ...

Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

TL;DR

This paper addresses the gap in evaluating LLM bias from an India-centric perspective by introducing Indian-BhED, a dataset focused on caste and religion stereotypes. It measures bias across both encoder- and decoder-based models, comparing India-centric axes (caste, religion) with US-centric axes (race, gender) using AUL and CLL metrics, and the GPT-3.5 API with majority voting. Key findings show widespread Indian-context stereotypical biases, especially for religion, with GPT-2, GPT-2 Large, and GPT-3.5 exhibiting substantial caste and religion bias (roughly 63–79% for caste and 69–72% for religion), while some models show relatively neutral or reversed biases for US-centric axes. The work highlights the need for diverse, cross-cultural fairness evaluations and proposes mitigations and future directions, including multilingual extensions and better representation of Global South voices in AI research.

Abstract

Large Language Models (LLMs), now used daily by millions, can encode societal biases, exposing their users to representational harms. A large body of scholarship on LLM bias exists but it predominantly adopts a Western-centric frame and attends comparatively less to bias levels and potential harms in the Global South. In this paper, we quantify stereotypical bias in popular LLMs according to an Indian-centric frame through Indian-BhED, a first of its kind dataset, containing stereotypical and anti-stereotypical examples in the context of caste and religious stereotypes in India. We find that the majority of LLMs tested have a strong propensity to output stereotypes in the Indian context, especially when compared to axes of bias traditionally studied in the Western context, such as gender and race. Notably, we find that GPT-2, GPT-2 Large, and GPT 3.5 have a particularly high propensity for preferring stereotypical outputs as a percent of all sentences for the axes of caste (63-79%) and religion (69-72%). We finally investigate potential causes for such harmful behaviour in LLMs, and posit intervention techniques to reduce both stereotypical and anti-stereotypical biases. The findings of this work highlight the need for including more diverse voices when researching fairness in AI and evaluating LLMs.

Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

TL;DR

This paper addresses the gap in evaluating LLM bias from an India-centric perspective by introducing Indian-BhED, a dataset focused on caste and religion stereotypes. It measures bias across both encoder- and decoder-based models, comparing India-centric axes (caste, religion) with US-centric axes (race, gender) using AUL and CLL metrics, and the GPT-3.5 API with majority voting. Key findings show widespread Indian-context stereotypical biases, especially for religion, with GPT-2, GPT-2 Large, and GPT-3.5 exhibiting substantial caste and religion bias (roughly 63–79% for caste and 69–72% for religion), while some models show relatively neutral or reversed biases for US-centric axes. The work highlights the need for diverse, cross-cultural fairness evaluations and proposes mitigations and future directions, including multilingual extensions and better representation of Global South voices in AI research.

Abstract

Large Language Models (LLMs), now used daily by millions, can encode societal biases, exposing their users to representational harms. A large body of scholarship on LLM bias exists but it predominantly adopts a Western-centric frame and attends comparatively less to bias levels and potential harms in the Global South. In this paper, we quantify stereotypical bias in popular LLMs according to an Indian-centric frame through Indian-BhED, a first of its kind dataset, containing stereotypical and anti-stereotypical examples in the context of caste and religious stereotypes in India. We find that the majority of LLMs tested have a strong propensity to output stereotypes in the Indian context, especially when compared to axes of bias traditionally studied in the Western context, such as gender and race. Notably, we find that GPT-2, GPT-2 Large, and GPT 3.5 have a particularly high propensity for preferring stereotypical outputs as a percent of all sentences for the axes of caste (63-79%) and religion (69-72%). We finally investigate potential causes for such harmful behaviour in LLMs, and posit intervention techniques to reduce both stereotypical and anti-stereotypical biases. The findings of this work highlight the need for including more diverse voices when researching fairness in AI and evaluating LLMs.
Paper Structure (27 sections, 2 equations, 4 tables)