Table of Contents
Fetching ...

What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

Zhongyu Ouyang, Qianlong Wen, Chunhui Zhang, Yanfang Ye, Soroush Vosoughi

TL;DR

The paper addresses why LLM-based sequential recommender systems fail to mimic human-like decision strategies. It identifies two overlooked factors—preference intensity and temporal context—and introduces RecPO, an adaptive-margin framework that encodes graded preferences and immediacy into the alignment objective. By deriving a margin gamma_r and integrating Plackett-Luce ranking with a BT-style probability, RecPO yields improved recommendation accuracy while aligning with human-like behaviors such as prioritizing timely satisfaction and exercising aversion under shifting contexts. Empirical results on five real-world datasets show that RecPO outperforms previous SFT and alignment-based baselines and reveals robust, human-aligned patterns across diverse interaction histories, suggesting significant practical impact for real-time, context-aware recommendations.

Abstract

Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.

What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

TL;DR

The paper addresses why LLM-based sequential recommender systems fail to mimic human-like decision strategies. It identifies two overlooked factors—preference intensity and temporal context—and introduces RecPO, an adaptive-margin framework that encodes graded preferences and immediacy into the alignment objective. By deriving a margin gamma_r and integrating Plackett-Luce ranking with a BT-style probability, RecPO yields improved recommendation accuracy while aligning with human-like behaviors such as prioritizing timely satisfaction and exercising aversion under shifting contexts. Empirical results on five real-world datasets show that RecPO outperforms previous SFT and alignment-based baselines and reveals robust, human-aligned patterns across diverse interaction histories, suggesting significant practical impact for real-time, context-aware recommendations.

Abstract

Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.

Paper Structure

This paper contains 43 sections, 18 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Human behaviors involve trade-offs among preference intensity, satisfaction delay, effort, and risk---factors largely overlooked in current LLM-based preference modeling.
  • Figure 2: Hit@1 in next favorable item prediction with comprehensive and structured preference feedback.
  • Figure 3: Illustrations for preference learning frameworks with binary and enriched preference: the prior assumes binary distinction in preference, while the latter enriches preference distinction with preference intensity and temporal context ($\delta$ indicates the enrichment).
  • Figure 4: Comparing between SFT, S-DPO, and RecPO from the perspectives of adhering to contextual preference (a), avoiding unfavorable items under temptation (b), identifying dis-preferred items (c), and consistently performing across varying user history lengths (d). The adherence rate and avoidance rate are defined in § \ref{['sec:exp_analysis']}.
  • Figure 5: Textual prompt examples for Amazon-books and MovieLens.
  • ...and 1 more figures