FAIRGAMER: Evaluating Social Biases in LLM-Based Video Game NPCs

Bingkang Shi; Jen-tse Huang; Long Luo; Tianyu Zong; Hongzhu Yi; Yuanxiang Wang; Songlin Hu; Xiaodan Zhang; Zhongjiang Yao

FAIRGAMER: Evaluating Social Biases in LLM-Based Video Game NPCs

Bingkang Shi, Jen-tse Huang, Long Luo, Tianyu Zong, Hongzhu Yi, Yuanxiang Wang, Songlin Hu, Xiaodan Zhang, Zhongjiang Yao

TL;DR

FairGamer introduces the first benchmark to quantify social biases in LLM-driven game NPCs across three interaction patterns and four bias types, using a novel multivariate fairness metric (FairMCV). By mapping NPC interactions to game-theoretic settings (bargaining, allocation, and zero-sum competition) and assembling a large bilingual dataset, the approach reveals that bias is an intrinsic model property and can be amplified by larger models or difficult interaction regimes. Chain-of-Thought debiasing provides partial mitigation, but substantial bias persists, underscoring the need for robust debiasing or post-training interventions in game AI. The work offers a practical, data-driven framework for evaluating and improving NPC fairness, with implications for fair gameplay and user experience in diverse virtual worlds.

Abstract

Large Language Models (LLMs) have increasingly enhanced or replaced traditional Non-Player Characters (NPCs) in video games. However, these LLM-based NPCs inherit underlying social biases (e.g., race or class), posing fairness risks during in-game interactions. To address the limited exploration of this issue, we introduce FairGamer, the first benchmark to evaluate social biases across three interaction patterns: transaction, cooperation, and competition. FairGamer assesses four bias types, including class, race, age, and nationality, across 12 distinct evaluation tasks using a novel metric, FairMCV. Our evaluation of seven frontier LLMs reveals that: (1) models exhibit biased decision-making, with Grok-4-Fast demonstrating the highest bias (average FairMCV = 76.9%); and (2) larger LLMs display more severe social biases, suggesting that increased model capacity inadvertently amplifies these biases. We release FairGamer at https://github.com/Anonymous999-xxx/FairGamer to facilitate future research on NPC fairness.

FAIRGAMER: Evaluating Social Biases in LLM-Based Video Game NPCs

TL;DR

Abstract

FAIRGAMER: Evaluating Social Biases in LLM-Based Video Game NPCs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)