Efficient and Responsible Adaptation of Large Language Models for Robust Top-k Recommendations
Kirandeep Kaur, Chirag Shah
TL;DR
This work tackles robustness gaps in recommender systems by marrying traditional RSs with large language models through a responsible, per-user task allocation strategy. It first identifies weak/inactive users using a dual criterion based on a sparsity index $S_I(u)$ and ranking performance $P(u)$, then employs in-context learning to prompt LLMs to rank weak users’ histories, while strong users rely on RS rankings. Across three real-world datasets and multiple baselines, the approach yields significant improvements in weak-user performance and overall robustness (about 12%), with substantial reductions in the number of weak users and manageable cost increases depending on the LLM used. The method demonstrates that open-source LLMs can achieve competitive results alongside closed models when deployed selectively, thereby enabling practical, responsible deployment of generative models in recommendation systems.
Abstract
Conventional recommendation systems (RSs) are typically optimized to enhance performance metrics uniformly across all training samples. This makes it hard for data-driven RSs to cater to a diverse set of users due to the varying properties of these users. The performance disparity among various populations can harm the model's robustness with respect to sub-populations. While recent works have shown promising results in adapting large language models (LLMs) for recommendation to address hard samples, long user queries from millions of users can degrade the performance of LLMs and elevate costs, processing times and inference latency. This challenges the practical applicability of LLMs for recommendations. To address this, we propose a hybrid task allocation framework that utilizes the capabilities of both LLMs and traditional RSs. By adopting a two-phase approach to improve robustness to sub-populations, we promote a strategic assignment of tasks for efficient and responsible adaptation of LLMs. Our strategy works by first identifying the weak and inactive users that receive a suboptimal ranking performance by RSs. Next, we use an in-context learning approach for such users, wherein each user interaction history is contextualized as a distinct ranking task and given to an LLM. We test our hybrid framework by incorporating various recommendation algorithms -- collaborative filtering and learning-to-rank recommendation models -- and two LLMs -- both open and close-sourced. Our results on three real-world datasets show a significant reduction in weak users and improved robustness of RSs to sub-populations $(\approx12\%)$ and overall performance without disproportionately escalating costs.
