Table of Contents
Fetching ...

Aligning Large Language Models for Controllable Recommendations

Wensheng Lu, Jianxun Lian, Wei Zhang, Guanghua Li, Mingyang Zhou, Hao Liao, Xing Xie

TL;DR

This work tackles aligning LLMs for controllable recommendations by introducing a two-stage framework: a supervised learning stage with intention-aware data-generation tasks and label augmentation from a traditional recommender (SASRec), followed by a reinforcement learning stage with tailored item- and list-level rewards. The approach explicitly models intention categories (Implicit, Item-wise, List-wise) and optimizes for instruction adherence and formatting quality, reducing errors common in domain-tuned LLMs. Across two real-world datasets, the method consistently surpasses general LLMs and prior fine-tuned baselines in instruction-following and controllability, achieving performance close to a traditional teacher model on standard accuracy metrics. The results demonstrate the practical potential of LLMs as interactive, explainable, and controllable recommender agents, with broader implications for building user-aligned conversational recommendation systems.

Abstract

Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting the ability to follow instructions. To address this gap, we initially introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs' proficiency in adhering to recommendation-specific instructions. Subsequently, we develop a reinforcement learning-based alignment procedure to further strengthen LLMs' aptitude in responding to users' intentions and mitigating formatting errors. Through extensive experiments on two real-world datasets, our method markedly advances the capability of LLMs to comply with instructions within recommender systems, while sustaining a high level of accuracy performance.

Aligning Large Language Models for Controllable Recommendations

TL;DR

This work tackles aligning LLMs for controllable recommendations by introducing a two-stage framework: a supervised learning stage with intention-aware data-generation tasks and label augmentation from a traditional recommender (SASRec), followed by a reinforcement learning stage with tailored item- and list-level rewards. The approach explicitly models intention categories (Implicit, Item-wise, List-wise) and optimizes for instruction adherence and formatting quality, reducing errors common in domain-tuned LLMs. Across two real-world datasets, the method consistently surpasses general LLMs and prior fine-tuned baselines in instruction-following and controllability, achieving performance close to a traditional teacher model on standard accuracy metrics. The results demonstrate the practical potential of LLMs as interactive, explainable, and controllable recommender agents, with broader implications for building user-aligned conversational recommendation systems.

Abstract

Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting the ability to follow instructions. To address this gap, we initially introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs' proficiency in adhering to recommendation-specific instructions. Subsequently, we develop a reinforcement learning-based alignment procedure to further strengthen LLMs' aptitude in responding to users' intentions and mitigating formatting errors. Through extensive experiments on two real-world datasets, our method markedly advances the capability of LLMs to comply with instructions within recommender systems, while sustaining a high level of accuracy performance.
Paper Structure (39 sections, 8 equations, 1 figure, 8 tables, 1 algorithm)