With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

Samee Arif; Zohaib Khan; Maaidah Kaleem; Suhaib Rashid; Agha Ali Raza; Awais Athar

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

Samee Arif, Zohaib Khan, Maaidah Kaleem, Suhaib Rashid, Agha Ali Raza, Awais Athar

TL;DR

This work introduces SALT, a bias-benchmark for open-source LLMs (Llama and Gemma) across gender, religion, and race using five bias triggers spanning debates and real-world tasks. It combines automated anonymized evaluation with human validation to measure bias in both generated content and model judging, accounting for evaluation, position, and length biases. The study reports consistent polarization in outputs (e.g., religion and race biases) and shows that larger models can amplify biases, underscoring limits of scaling for fairness. SALT thus provides a scalable framework and dataset to diagnose bias and guide mitigation in equitable AI systems with practical impact on safety, hiring, and information integrity.

Abstract

This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs), across gender, religion, and race. Our study evaluates bias in smaller-scale Llama and Gemma models using the SALT ($\textbf{S}$ocial $\textbf{A}$ppropriateness in $\textbf{L}$LM-Generated $\textbf{T}$ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation. To quantify bias, we measure win rates in General Debate and the assignment of negative roles in Positioned Debate. For real-world use cases, such as Career Advice, Problem Solving, and CV Generation, we anonymize the outputs to remove explicit demographic identifiers and use DeepSeek-R1 as an automated evaluator. We also address inherent biases in LLM-based evaluation, including evaluation bias, positional bias, and length bias, and validate our results through human evaluations. Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment. By introducing SALT, we provide a comprehensive benchmark for bias analysis and underscore the need for robust bias mitigation strategies in the development of equitable AI systems.

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

TL;DR

Abstract

ocial

ppropriateness in

LM-Generated

ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation. To quantify bias, we measure win rates in General Debate and the assignment of negative roles in Positioned Debate. For real-world use cases, such as Career Advice, Problem Solving, and CV Generation, we anonymize the outputs to remove explicit demographic identifiers and use DeepSeek-R1 as an automated evaluator. We also address inherent biases in LLM-based evaluation, including evaluation bias, positional bias, and length bias, and validate our results through human evaluations. Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment. By introducing SALT, we provide a comprehensive benchmark for bias analysis and underscore the need for robust bias mitigation strategies in the development of equitable AI systems.

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

TL;DR

Abstract

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)