Table of Contents
Fetching ...

UFO: Unfair-to-Fair Evolving Mitigates Unfairness in LLM-based Recommender Systems via Self-Play Fine-tuning

Jiaming Zhang, Yuyuan Li, Xiaohua Feng, Zhifei Ren, Li Zhang, Chaochao Chen

TL;DR

This work identifies item-side unfairness in LLM-based recommender systems as arising from a combination of pre-training biases and SFT adaptation. It introduces UFO, a self-play framework with a judger and a corrector that iteratively identify and mitigate unfairness, aided by distributional next-item generation and a geometric mixture to retain performance. The approach targets fairness at the distribution level by aligning outputs to a proxy fair distribution drawn from the SFT dataset, and demonstrates that UFO reduces group unfairness while maintaining or improving NDCG@K and HR@K across multiple real-world datasets and base LLMs. Ablation studies confirm the importance of distributional generation, the mixture mechanism, and iterative evolution in achieving robust fairness improvements. Overall, UFO offers a practical, post-training solution to achieve unfair-to-fair evolution in LRSs with real-world impact for item-side equity in recommender systems.

Abstract

Large language model-based Recommender Systems (LRSs) have demonstrated superior recommendation performance by integrating pre-training with Supervised Fine-Tuning (SFT). However, this approach introduces item-side unfairness. Existing studies primarily attribute this issue to the absence of fairness constraints during SFT and attempt to mitigate unfairness via re-weighting and re-ranking methods. In this paper, we find that unfairness arises not only from SFT but also from pre-training, where inherent biases are further amplified during SFT. This finding underscores the failure of current methods to address the root causes of unfairness. Moreover, current methods struggle to preserve satisfactory recommendation performance. To tackle these issues, we propose an Unfair-to-Fair evOlving (UFO) framework using a self-play mechanism, formulating unfairness mitigation as a two-player game. UFO alternates between two player roles: the \textit{judger}, which identifies unfairness from both pre-training and SFT, and the \textit{corrector}, which adjusts the LRS to address identified unfairness while preserving recommendation performance. Iterative optimization between these roles enables UFO to completely resolve unfairness. Extensive experiments demonstrate that UFO effectively mitigates unfairness while improving recommendation performance.

UFO: Unfair-to-Fair Evolving Mitigates Unfairness in LLM-based Recommender Systems via Self-Play Fine-tuning

TL;DR

This work identifies item-side unfairness in LLM-based recommender systems as arising from a combination of pre-training biases and SFT adaptation. It introduces UFO, a self-play framework with a judger and a corrector that iteratively identify and mitigate unfairness, aided by distributional next-item generation and a geometric mixture to retain performance. The approach targets fairness at the distribution level by aligning outputs to a proxy fair distribution drawn from the SFT dataset, and demonstrates that UFO reduces group unfairness while maintaining or improving NDCG@K and HR@K across multiple real-world datasets and base LLMs. Ablation studies confirm the importance of distributional generation, the mixture mechanism, and iterative evolution in achieving robust fairness improvements. Overall, UFO offers a practical, post-training solution to achieve unfair-to-fair evolution in LRSs with real-world impact for item-side equity in recommender systems.

Abstract

Large language model-based Recommender Systems (LRSs) have demonstrated superior recommendation performance by integrating pre-training with Supervised Fine-Tuning (SFT). However, this approach introduces item-side unfairness. Existing studies primarily attribute this issue to the absence of fairness constraints during SFT and attempt to mitigate unfairness via re-weighting and re-ranking methods. In this paper, we find that unfairness arises not only from SFT but also from pre-training, where inherent biases are further amplified during SFT. This finding underscores the failure of current methods to address the root causes of unfairness. Moreover, current methods struggle to preserve satisfactory recommendation performance. To tackle these issues, we propose an Unfair-to-Fair evOlving (UFO) framework using a self-play mechanism, formulating unfairness mitigation as a two-player game. UFO alternates between two player roles: the \textit{judger}, which identifies unfairness from both pre-training and SFT, and the \textit{corrector}, which adjusts the LRS to address identified unfairness while preserving recommendation performance. Iterative optimization between these roles enables UFO to completely resolve unfairness. Extensive experiments demonstrate that UFO effectively mitigates unfairness while improving recommendation performance.

Paper Structure

This paper contains 40 sections, 21 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Group-level Bias Estimation on ML-1M.
  • Figure 2: An illustration of the pipelines of SFT and our proposed UFO. SFT tends to amplify the inherent unfairness in LRSs introduced during pre-training. In contrast, UFO aims to mitigate unfairness within LRSs completely. By iteratively identifying and enhancing fairness through the self-play of a judger and a corrector, UFO enables LRS to evolve from an unfair state to a fair one.
  • Figure 3: Fairness performance under different top-K recommendations on the Steam dataset. Pop. denotes Popularity Fairness, and Genre denotes Genre Fairness.
  • Figure 4: Effect of evolving iterations.
  • Figure 5: Effect of geometric mixture parameter $\alpha$.
  • ...and 1 more figures