A Study of Implicit Ranking Unfairness in Large Language Models

Chen Xu; Wenjie Wang; Yuxin Li; Liang Pang; Jun Xu; Tat-Seng Chua

A Study of Implicit Ranking Unfairness in Large Language Models

Chen Xu, Wenjie Wang, Yuxin Li, Liang Pang, Jun Xu, Tat-Seng Chua

TL;DR

The need for the community to identify and mitigate the implicit unfairness in Large Language Models, aiming to avert the potential deterioration in the reinforced human-LLMs ecosystem deterioration, is emphasized.

Abstract

Recently, Large Language Models (LLMs) have demonstrated a superior ability to serve as ranking models. However, concerns have arisen as LLMs will exhibit discriminatory ranking behaviors based on users' sensitive attributes (\eg gender). Worse still, in this paper, we identify a subtler form of discrimination in LLMs, termed \textit{implicit ranking unfairness}, where LLMs exhibit discriminatory ranking patterns based solely on non-sensitive user profiles, such as user names. Such implicit unfairness is more widespread but less noticeable, threatening the ethical foundation. To comprehensively explore such unfairness, our analysis will focus on three research aspects: (1) We propose an evaluation method to investigate the severity of implicit ranking unfairness. (2) We uncover the reasons for causing such unfairness. (3) To mitigate such unfairness effectively, we utilize a pair-wise regression method to conduct fair-aware data augmentation for LLM fine-tuning. The experiment demonstrates that our method outperforms existing approaches in ranking fairness, achieving this with only a small reduction in accuracy. Lastly, we emphasize the need for the community to identify and mitigate the implicit unfairness, aiming to avert the potential deterioration in the reinforced human-LLMs ecosystem deterioration.

A Study of Implicit Ranking Unfairness in Large Language Models

TL;DR

Abstract

Paper Structure (30 sections, 6 equations, 9 figures, 6 tables)

This paper contains 30 sections, 6 equations, 9 figures, 6 tables.

Introduction
Preliminary
LLMs-based Ranking Tasks
Implicit Ranking Unfairness
Evaluation Settings
Non-sensitive Attribute Selection
Discrimination Measurement
Other Settings
Implicit Unfairness of LLMs
Existence of Implicit Unfairness
Implicit Ranking Unfairness Degree
Implicit Ranking Unfairness Traceback
Inferring Sensitive Attribute Ability
Word Embedding Similarities.
Implicit Ranking Unfairness Mitigation
...and 15 more sections

Figures (9)

Figure 1: Overall workflow of our evaluation. The ranking list outputs by LLMs should be the same when replacing different sensitive attributes in prompts.
Figure 2: The discriminatory behaviors (i.e., topic distribution $P(L_K(s))$) against certain topics of LLMs under job and news domain for user names belonging to different Gender and Race groups.
Figure 3: The discriminatory ranking behaviors (i.e., topic distribution $P(L_K(s))$) against certain topics of LLMs under job and news domain for user names belonging to different Continent groups. A deeper red color indicates that LLMs are more likely to assign this type of news or jobs to users in the continent, while a deeper blue color suggests that LLMs are less likely to assign this type of news or jobs to users in the continent.
Figure 4: The discriminatory ranking behaviors against certain topics of LLMs under the news domain for user emails. A deeper red/blue color indicates that LLMs are more/less likely to assign this type of news.
Figure 5: Similarity curves of different gender groups w.r.t. interaction rounds. Higher similarity denotes the LLMs will deliver more items related to topics to users.
...and 4 more figures

A Study of Implicit Ranking Unfairness in Large Language Models

TL;DR

Abstract

A Study of Implicit Ranking Unfairness in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)