DTRec: Learning Dynamic Reasoning Trajectories for Sequential Recommendation
Yifan Shao, Peilin Zhou, Shoujin Wang, Weizhi Zhang, Xu Cai, Sunghun Kim
TL;DR
DTRec tackles the rigidity of static reasoning trajectories in reasoning-enhanced sequential recommendation by learning dynamic reasoning trajectories along direction and depth. It introduces Hierarchical Process Supervision to guide coarse-to-fine reasoning and Adaptive Reasoning Halting to allocate computation adaptively. The method demonstrates strong empirical gains on three real-world datasets, with notable improvements in accuracy and reductions in computational cost. The work provides insights into interpretable reasoning trajectories and adaptive-depth inference for scalable, reasoning-enhanced recommender systems.
Abstract
Inspired by advances in LLMs, reasoning-enhanced sequential recommendation performs multi-step deliberation before making final predictions, unlocking greater potential for capturing user preferences. However, current methods are constrained by static reasoning trajectories that are ill-suited for the diverse complexity of user behaviors. They suffer from two key limitations: (1) a static reasoning direction, which uses flat supervision signals misaligned with human-like hierarchical reasoning, and (2) a fixed reasoning depth, which inefficiently applies the same computational effort to all users, regardless of pattern complexity. These rigidity lead to suboptimal performance and significant computational waste. To overcome these challenges, we propose DTRec, a novel and effective framework that explores the Dynamic reasoning Trajectory for Sequential Recommendation along both direction and depth. To guide the direction, we develop Hierarchical Process Supervision (HPS), which provides coarse-to-fine supervisory signals to emulate the natural, progressive refinement of human cognitive processes. To optimize the depth, we introduce the Adaptive Reasoning Halting (ARH) mechanism that dynamically adjusts the number of reasoning steps by jointly monitoring three indicators. Extensive experiments on three real-world datasets demonstrate the superiority of our approach, achieving up to a 24.5% performance improvement over strong baselines while simultaneously reducing computational cost by up to 41.6%.
