Table of Contents
Fetching ...

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

Samee Arif, Zohaib Khan, Maaidah Kaleem, Suhaib Rashid, Agha Ali Raza, Awais Athar

TL;DR

This work introduces SALT, a bias-benchmark for open-source LLMs (Llama and Gemma) across gender, religion, and race using five bias triggers spanning debates and real-world tasks. It combines automated anonymized evaluation with human validation to measure bias in both generated content and model judging, accounting for evaluation, position, and length biases. The study reports consistent polarization in outputs (e.g., religion and race biases) and shows that larger models can amplify biases, underscoring limits of scaling for fairness. SALT thus provides a scalable framework and dataset to diagnose bias and guide mitigation in equitable AI systems with practical impact on safety, hiring, and information integrity.

Abstract

This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs), across gender, religion, and race. Our study evaluates bias in smaller-scale Llama and Gemma models using the SALT ($\textbf{S}$ocial $\textbf{A}$ppropriateness in $\textbf{L}$LM-Generated $\textbf{T}$ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation. To quantify bias, we measure win rates in General Debate and the assignment of negative roles in Positioned Debate. For real-world use cases, such as Career Advice, Problem Solving, and CV Generation, we anonymize the outputs to remove explicit demographic identifiers and use DeepSeek-R1 as an automated evaluator. We also address inherent biases in LLM-based evaluation, including evaluation bias, positional bias, and length bias, and validate our results through human evaluations. Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment. By introducing SALT, we provide a comprehensive benchmark for bias analysis and underscore the need for robust bias mitigation strategies in the development of equitable AI systems.

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

TL;DR

This work introduces SALT, a bias-benchmark for open-source LLMs (Llama and Gemma) across gender, religion, and race using five bias triggers spanning debates and real-world tasks. It combines automated anonymized evaluation with human validation to measure bias in both generated content and model judging, accounting for evaluation, position, and length biases. The study reports consistent polarization in outputs (e.g., religion and race biases) and shows that larger models can amplify biases, underscoring limits of scaling for fairness. SALT thus provides a scalable framework and dataset to diagnose bias and guide mitigation in equitable AI systems with practical impact on safety, hiring, and information integrity.

Abstract

This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs), across gender, religion, and race. Our study evaluates bias in smaller-scale Llama and Gemma models using the SALT (ocial ppropriateness in LM-Generated ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation. To quantify bias, we measure win rates in General Debate and the assignment of negative roles in Positioned Debate. For real-world use cases, such as Career Advice, Problem Solving, and CV Generation, we anonymize the outputs to remove explicit demographic identifiers and use DeepSeek-R1 as an automated evaluator. We also address inherent biases in LLM-based evaluation, including evaluation bias, positional bias, and length bias, and validate our results through human evaluations. Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment. By introducing SALT, we provide a comprehensive benchmark for bias analysis and underscore the need for robust bias mitigation strategies in the development of equitable AI systems.

Paper Structure

This paper contains 35 sections, 1 equation, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Gender Bias Scores across each trigger and model.
  • Figure 2: Religious Bias Scores for each model, aggregated across each trigger.
  • Figure 3: Racial Bias Scores for each model, computed in a pairwise manner, aggregated across all triggers.