Religious Bias Landscape in Language and Text-to-Image Models: Analysis, Detection, and Debiasing Strategies
Ajwad Abrar, Nafisa Tabassum Oeshy, Mohsinul Kabir, Sophia Ananiadou
TL;DR
The paper probes religious bias in language and text-to-image models, highlighting persistent negative associations (notably with Islam) across masked prediction, prompt completion, and image generation. It introduces a cross-domain evaluation framework using 100 prompts per model for mask filling and prompt completion and 50 biased images per adjective for T2I tasks, plus a Religious Bias Score $RBS$ to quantify bias. Debiasing prompts—positive term augmentation and bias mitigation instructions—substantially reduce bias but do not eliminate it, with some open- and closed-source models showing near-zero bias after intervention. The study also uncovers cross-domain biases linking religion to nationality, gender, and age, emphasizing the need for more robust training data and systemic debiasing approaches beyond prompt engineering. Overall, the work provides a public artifact of prompts and images to advance fairer, globally acceptable AI systems.
Abstract
Note: This paper includes examples of potentially offensive content related to religious bias, presented solely for academic purposes. The widespread adoption of language models highlights the need for critical examinations of their inherent biases, particularly concerning religion. This study systematically investigates religious bias in both language models and text-to-image generation models, analyzing both open-source and closed-source systems. We construct approximately 400 unique, naturally occurring prompts to probe language models for religious bias across diverse tasks, including mask filling, prompt completion, and image generation. Our experiments reveal concerning instances of underlying stereotypes and biases associated disproportionately with certain religions. Additionally, we explore cross-domain biases, examining how religious bias intersects with demographic factors such as gender, age, and nationality. This study further evaluates the effectiveness of targeted debiasing techniques by employing corrective prompts designed to mitigate the identified biases. Our findings demonstrate that language models continue to exhibit significant biases in both text and image generation tasks, emphasizing the urgent need to develop fairer language models to achieve global acceptability.
