Sometimes the Model doth Preach: Quantifying Religious Bias in Open LLMs through Demographic Analysis in Asian Nations
Hari Shankar, Vedanta S P, Tejas Cavale, Ponnurangam Kumaraguru, Abhijnan Chakraborty
TL;DR
The paper tackles religious bias in open LLMs within non-Western contexts by proposing a demographic-profiling framework that maps model responses to survey-derived demographic vectors using One-Hot encoding and the Hamming distance $d_H$. It evaluates multiple open LLMs (e.g., Llama and Mistral) against Pew-style surveys from India and East/Southeast Asia, revealing that models often converge to a single homogeneous demographic profile, which raises concerns about hegemonic biases and minority representation. It further examines zero-shot steering prompts, finding limited effectiveness in altering the model's demographic alignment, and discusses data biases, alignment challenges, and potential mitigation strategies such as data augmentation or machine unlearning. The study provides an operational methodology for auditing LLMs’ social biases in diverse global contexts, with implications for safety, fairness, and future research directions in steering and data curation.
Abstract
Large Language Models (LLMs) are capable of generating opinions and propagating bias unknowingly, originating from unrepresentative and non-diverse data collection. Prior research has analysed these opinions with respect to the West, particularly the United States. However, insights thus produced may not be generalized in non-Western populations. With the widespread usage of LLM systems by users across several different walks of life, the cultural sensitivity of each generated output is of crucial interest. Our work proposes a novel method that quantitatively analyzes the opinions generated by LLMs, improving on previous work with regards to extracting the social demographics of the models. Our method measures the distance from an LLM's response to survey respondents, through Hamming Distance, to infer the demographic characteristics reflected in the model's outputs. We evaluate modern, open LLMs such as Llama and Mistral on surveys conducted in various global south countries, with a focus on India and other Asian nations, specifically assessing the model's performance on surveys related to religious tolerance and identity. Our analysis reveals that most open LLMs match a single homogeneous profile, varying across different countries/territories, which in turn raises questions about the risks of LLMs promoting a hegemonic worldview, and undermining perspectives of different minorities. Our framework may also be useful for future research investigating the complex intersection between training data, model architecture, and the resulting biases reflected in LLM outputs, particularly concerning sensitive topics like religious tolerance and identity.
