Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations

Kirandeep Kaur; Vinayak Gupta; Manya Chadha; Chirag Shah

Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations

Kirandeep Kaur, Vinayak Gupta, Manya Chadha, Chirag Shah

TL;DR

The paper tackles robustness and fairness gaps in recommender systems caused by data sparsity and subpopulation disparities. It proposes a two-phase hybrid framework that identifies weak users via sparsity $S_I(u)$ and performance $P(u)$, then leverages in-context learning with large language models to generate personalized rankings for these users, while strong users continue to receive rankings from traditional RSs. Across three real-world datasets and multiple RS baselines, the approach achieves significant improvements for weak users, with GPT-4 providing the strongest gains, and notably reduces computational costs by routing only a subset of users through the LLMs. The work demonstrates that instance-level, cost-aware LLM augmentation can enhance robustness and fairness without prohibitive overhead, offering a practical path for responsible deployment of foundation models in personalized ranking tasks.

Abstract

Conventional recommendation systems (RSs) are typically optimized to enhance performance metrics uniformly across all training samples, inadvertently overlooking the needs of diverse user populations. The performance disparity among various populations can harm the model's robustness to sub-populations due to the varying user properties. While large language models (LLMs) show promise in enhancing RS performance, their practical applicability is hindered by high costs, inference latency, and degraded performance on long user queries. To address these challenges, we propose a hybrid task allocation framework designed to promote social good by equitably serving all user groups. By adopting a two-phase approach, we promote a strategic assignment of tasks for efficient and responsible adaptation of LLMs. Our strategy works by first identifying the weak and inactive users that receive a suboptimal ranking performance by RSs. Next, we use an in-context learning approach for such users, wherein each user interaction history is contextualized as a distinct ranking task. We evaluate our hybrid framework by incorporating eight different recommendation algorithms and three different LLMs -- both open and close-sourced. Our results on three real-world datasets show a significant reduction in weak users and improved robustness to subpopulations without disproportionately escalating costs.

Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations

TL;DR

and performance

, then leverages in-context learning with large language models to generate personalized rankings for these users, while strong users continue to receive rankings from traditional RSs. Across three real-world datasets and multiple RS baselines, the approach achieves significant improvements for weak users, with GPT-4 providing the strongest gains, and notably reduces computational costs by routing only a subset of users through the LLMs. The work demonstrates that instance-level, cost-aware LLM augmentation can enhance robustness and fairness without prohibitive overhead, offering a practical path for responsible deployment of foundation models in personalized ranking tasks.

Abstract

Paper Structure (22 sections, 7 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 7 equations, 8 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Robustness in Recommendation Systems
Addressing Robustness to Data Sparsity and Subpopulations:
Large Language Models in Recommendation Systems
Challenges and Adaptation Techniques:
Methodology
Problem Formulation
Identifying Weak Users
Designing Natural Language Instructions for Ranking
Our Framework
Experiments
Experimental Setup
Datasets.
Baselines and Models.
...and 7 more sections

Figures (8)

Figure 1: An overview of our framework that uses task allocation to adapt LLMs responsibly. We compute each user's sparsity index ($S_I$), evaluate recommendations retrieved from RS using performance metric ($P(u_m)$), and plot $P(u_m)$ against $S_I$. Interaction histories of highly sparse users with low $P(u_m)$ are contextualized and given to LLM for ranking. Strong users receive RS recommendations, while weak users get LLM recommendations if LLM outperforms RS.
Figure 2: Instruction template for contextualizing interaction histories of weak users.
Figure 3: Histograms illustrating the distribution of the number of ratings per user for three datasets.
Figure 4: Distribution of user sparsity across three datasets.
Figure 5: Comparative analysis of user distribution based on average sparsity levels.
...and 3 more figures

Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations

TL;DR

Abstract

Efficient and Responsible Adaptation of Large Language Models for Robust and Equitable Top-k Recommendations

Authors

TL;DR

Abstract

Table of Contents

Figures (8)