Item-side Fairness of Large Language Model-based Recommendation System

Meng Jiang; Keqin Bao; Jizhi Zhang; Wenjie Wang; Zhengyi Yang; Fuli Feng; Xiangnan He

Item-side Fairness of Large Language Model-based Recommendation System

Meng Jiang, Keqin Bao, Jizhi Zhang, Wenjie Wang, Zhengyi Yang, Fuli Feng, Xiangnan He

TL;DR

This work investigates item-side fairness in LLM-based recommender systems (LRS), arguing that LRS inherits popularity-driven biases and semantic biases from pretraining. It introduces IFairLRS, a two-stage framework that applies reweighting during instruction-tuning and a punishment-based reranking during inference to calibrate item exposure. Empirical results on MovieLens1M and Steam show that IFairLRS improves fairness metrics such as $MGU@K$ and $DGU@K$ with minimal losses in $NDCG@K$ and $HR@K$, highlighting practical gains for fairer item exposure. The study also reveals that grounding and pretraining contribute to unfairness, suggesting future work on broader groupings and deeper analysis of LLM priors in LRS.

Abstract

Recommendation systems for Web content distribution intricately connect to the information access and exposure opportunities for vulnerable populations. The emergence of Large Language Models-based Recommendation System (LRS) may introduce additional societal challenges to recommendation systems due to the inherent biases in Large Language Models (LLMs). From the perspective of item-side fairness, there remains a lack of comprehensive investigation into the item-side fairness of LRS given the unique characteristics of LRS compared to conventional recommendation systems. To bridge this gap, this study examines the property of LRS with respect to item-side fairness and reveals the influencing factors of both historical users' interactions and inherent semantic biases of LLMs, shedding light on the need to extend conventional item-side fairness methods for LRS. Towards this goal, we develop a concise and effective framework called IFairLRS to enhance the item-side fairness of an LRS. IFairLRS covers the main stages of building an LRS with specifically adapted strategies to calibrate the recommendations of LRS. We utilize IFairLRS to fine-tune LLaMA, a representative LLM, on \textit{MovieLens} and \textit{Steam} datasets, and observe significant item-side fairness improvements. The code can be found in https://github.com/JiangM-C/IFairLRS.git.

Item-side Fairness of Large Language Model-based Recommendation System

TL;DR

and

with minimal losses in

and

, highlighting practical gains for fairer item exposure. The study also reveals that grounding and pretraining contribute to unfairness, suggesting future work on broader groupings and deeper analysis of LLM priors in LRS.

Abstract

Paper Structure (29 sections, 12 equations, 5 figures, 6 tables)

This paper contains 29 sections, 12 equations, 5 figures, 6 tables.

Introduction
Related Work
Item-side Fairness in Recommendation
LLM-based Recommendation System
Preliminary
Evaluation of Item-side Fairness
Brief on BIGRec
Probe the Item-side Fairness of LRS
Experiment Setting
Datasets
Compared Method
Performance on Item-side Fairness
Popularity Division
Genre Division
Cause of the Fairness Issues in LRS
...and 14 more sections

Figures (5)

Figure 1: The proportion distribution of different groups divided by popularity in the top-$K$ recommendation results, compared with the proportion distribution of different groups in the historical interactions (purple curve).
Figure 2: Comparison of GU@1 between groups divided by genre. We split these genre groups into two parts based on their interaction proportions in historical sequences, and each part has the same number of groups. "Pos GU" denotes GU@1 > 0, and "Neg GU" denotes GU@1 < 0. We can observe that high-popularity genres would be over-recommended (Pos GU), while low-popularity genres tend to be overlooked (Neg GU).
Figure 3: Notations of (a): GH and GP denotes the proportions of groups in historical interactions and recommendation results respectively. The horizontal axis represents movie genres, where "Do", "Cr", "Ro", "Ac", and "Co" denote Documentary, Crime, Romance, Action, and Comedy respectively, whose GHs increase from left to right. Notations of (b): the horizontal axis represents different tasks, where "Mov" and "St" denote MovieLen1M and Steam datasets respectively; "Pop" and "Gen" denote groups divided by popularity and genre respectively.
Figure 4: The GU (Group Unfairness) of different groups divided by genres in top-$K$ recommendation results.
Figure 5: The proportions of various popularity groups of MovieLens1M and Steam.

Item-side Fairness of Large Language Model-based Recommendation System

TL;DR

Abstract

Item-side Fairness of Large Language Model-based Recommendation System

Authors

TL;DR

Abstract

Table of Contents

Figures (5)