Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations
Kirandeep Kaur, Vinayak Gupta, Manya Chadha, Chirag Shah
TL;DR
The paper tackles robustness and fairness gaps in recommender systems caused by data sparsity and subpopulation disparities. It proposes a two-phase hybrid framework that identifies weak users via sparsity $S_I(u)$ and performance $P(u)$, then leverages in-context learning with large language models to generate personalized rankings for these users, while strong users continue to receive rankings from traditional RSs. Across three real-world datasets and multiple RS baselines, the approach achieves significant improvements for weak users, with GPT-4 providing the strongest gains, and notably reduces computational costs by routing only a subset of users through the LLMs. The work demonstrates that instance-level, cost-aware LLM augmentation can enhance robustness and fairness without prohibitive overhead, offering a practical path for responsible deployment of foundation models in personalized ranking tasks.
Abstract
Conventional recommendation systems (RSs) are typically optimized to enhance performance metrics uniformly across all training samples, inadvertently overlooking the needs of diverse user populations. The performance disparity among various populations can harm the model's robustness to sub-populations due to the varying user properties. While large language models (LLMs) show promise in enhancing RS performance, their practical applicability is hindered by high costs, inference latency, and degraded performance on long user queries. To address these challenges, we propose a hybrid task allocation framework designed to promote social good by equitably serving all user groups. By adopting a two-phase approach, we promote a strategic assignment of tasks for efficient and responsible adaptation of LLMs. Our strategy works by first identifying the weak and inactive users that receive a suboptimal ranking performance by RSs. Next, we use an in-context learning approach for such users, wherein each user interaction history is contextualized as a distinct ranking task. We evaluate our hybrid framework by incorporating eight different recommendation algorithms and three different LLMs -- both open and close-sourced. Our results on three real-world datasets show a significant reduction in weak users and improved robustness to subpopulations without disproportionately escalating costs.
