Table of Contents
Fetching ...

AGR: Age Group fairness Reward for Bias Mitigation in LLMs

Shuirong Cao, Ruoxi Cheng, Zhiqiang Wang

TL;DR

This work tackles age bias in LLMs, a less-explored fairness dimension, by introducing AGR (Age Group fairness Reward) and constructing ABMB/ABMA age-bias datasets along with ABMB-IFT ABMA-IFT instruction-tuning sets derived from BBQ and ISB. It formalizes age-group fairness via per-age-group output statistics and presents a three-stage RLHF-like training pipeline where a reward model optimizes an objective $J(\phi)=\mathbb{E}_{y\sim\pi_\phi^{RL}}[R_\theta(y|x)]-\beta D_{KL}(\pi_\phi^{RL}\|\pi^{SFT})$, guided by the age-group fairness reward $R_\theta^\lambda$ that balances per-age-group quality $\mathrm{Q}$ and disparity $\mathrm{D}_{\text{total}}$ with coefficient $\lambda$. Empirical evaluation across four open-source 7B LLMs shows AGR improves content and tag-content accuracy and reduces age-related disparities compared with RLHF and other baselines, demonstrating practical potential for fairness-aware deployment. The work provides valuable datasets and code to advance age fairness research in NLP.

Abstract

LLMs can exhibit age biases, resulting in unequal treatment of individuals across age groups. While much research has addressed racial and gender biases, age bias remains little explored. The scarcity of instruction-tuning and preference datasets for age bias hampers its detection and measurement, and existing fine-tuning methods seldom address age-related fairness. In this paper, we construct age bias preference datasets and instruction-tuning datasets for RLHF. We introduce ARG, an age fairness reward to reduce differences in the response quality of LLMs across different age groups. Extensive experiments demonstrate that this reward significantly improves response accuracy and reduces performance disparities across age groups. Our source code and datasets are available at the anonymous \href{https://anonymous.4open.science/r/FairRLHF-D445/readme.md}{link}.

AGR: Age Group fairness Reward for Bias Mitigation in LLMs

TL;DR

This work tackles age bias in LLMs, a less-explored fairness dimension, by introducing AGR (Age Group fairness Reward) and constructing ABMB/ABMA age-bias datasets along with ABMB-IFT ABMA-IFT instruction-tuning sets derived from BBQ and ISB. It formalizes age-group fairness via per-age-group output statistics and presents a three-stage RLHF-like training pipeline where a reward model optimizes an objective , guided by the age-group fairness reward that balances per-age-group quality and disparity with coefficient . Empirical evaluation across four open-source 7B LLMs shows AGR improves content and tag-content accuracy and reduces age-related disparities compared with RLHF and other baselines, demonstrating practical potential for fairness-aware deployment. The work provides valuable datasets and code to advance age fairness research in NLP.

Abstract

LLMs can exhibit age biases, resulting in unequal treatment of individuals across age groups. While much research has addressed racial and gender biases, age bias remains little explored. The scarcity of instruction-tuning and preference datasets for age bias hampers its detection and measurement, and existing fine-tuning methods seldom address age-related fairness. In this paper, we construct age bias preference datasets and instruction-tuning datasets for RLHF. We introduce ARG, an age fairness reward to reduce differences in the response quality of LLMs across different age groups. Extensive experiments demonstrate that this reward significantly improves response accuracy and reduces performance disparities across age groups. Our source code and datasets are available at the anonymous \href{https://anonymous.4open.science/r/FairRLHF-D445/readme.md}{link}.
Paper Structure (19 sections, 6 equations, 3 figures, 2 tables)

This paper contains 19 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Accuracy of different LLMs across various bias categories on BBQ question-answer dataset.
  • Figure 2: Overview of Preference Dataset Construction.
  • Figure 3: Overview of the Three Steps of AGR.