Fairness Evaluation of Large Language Models in Academic Library Reference Services

Haining Wang; Jason Clark; Yueru Yan; Star Bradley; Ruiyang Chen; Yiqiong Zhang; Hengyi Fu; Zuoyu Tian

Fairness Evaluation of Large Language Models in Academic Library Reference Services

Haining Wang, Jason Clark, Yueru Yan, Star Bradley, Ruiyang Chen, Yiqiong Zhang, Hengyi Fu, Zuoyu Tian

TL;DR

This study analyzes whether large language-mediated academic library reference services treat patrons equitably across sex, race/ethnicity, and institutional role. It introduces the Fairness Evaluation Protocol (FEP), a two-phase, model-agnostic audit using diagnostic classifiers on TF-IDF features across six state-of-the-art LLMs (three commercial and three open), with synthetic patron data crafted to reflect realistic library interactions. The results show demographic neutrality across race/ethnicity and sex (with a single minor exception in one model) and clear, role-based accommodation signals (formality, self-identification as librarians, and domain-specific vocabulary) aligned with professional norms rather than bias. The authors advocate using FEP as a recurring evaluation tool for equitable AI-enabled library services and propose future work to expand demographic scope, dialect variation, and real-world deployments beyond academia.

Abstract

As libraries explore large language models (LLMs) for use in virtual reference services, a key question arises: Can LLMs serve all users equitably, regardless of demographics or social status? While they offer great potential for scalable support, LLMs may also reproduce societal biases embedded in their training data, risking the integrity of libraries' commitment to equitable service. To address this concern, we evaluate whether LLMs differentiate responses across user identities by prompting six state-of-the-art LLMs to assist patrons differing in sex, race/ethnicity, and institutional role. We find no evidence of differentiation by race or ethnicity, and only minor evidence of stereotypical bias against women in one model. LLMs demonstrate nuanced accommodation of institutional roles through the use of linguistic choices related to formality, politeness, and domain-specific vocabularies, reflecting professional norms rather than discriminatory treatment. These findings suggest that current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.

Fairness Evaluation of Large Language Models in Academic Library Reference Services

TL;DR

Abstract

Fairness Evaluation of Large Language Models in Academic Library Reference Services

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)