Table of Contents
Fetching ...

DTRec: Learning Dynamic Reasoning Trajectories for Sequential Recommendation

Yifan Shao, Peilin Zhou, Shoujin Wang, Weizhi Zhang, Xu Cai, Sunghun Kim

TL;DR

DTRec tackles the rigidity of static reasoning trajectories in reasoning-enhanced sequential recommendation by learning dynamic reasoning trajectories along direction and depth. It introduces Hierarchical Process Supervision to guide coarse-to-fine reasoning and Adaptive Reasoning Halting to allocate computation adaptively. The method demonstrates strong empirical gains on three real-world datasets, with notable improvements in accuracy and reductions in computational cost. The work provides insights into interpretable reasoning trajectories and adaptive-depth inference for scalable, reasoning-enhanced recommender systems.

Abstract

Inspired by advances in LLMs, reasoning-enhanced sequential recommendation performs multi-step deliberation before making final predictions, unlocking greater potential for capturing user preferences. However, current methods are constrained by static reasoning trajectories that are ill-suited for the diverse complexity of user behaviors. They suffer from two key limitations: (1) a static reasoning direction, which uses flat supervision signals misaligned with human-like hierarchical reasoning, and (2) a fixed reasoning depth, which inefficiently applies the same computational effort to all users, regardless of pattern complexity. These rigidity lead to suboptimal performance and significant computational waste. To overcome these challenges, we propose DTRec, a novel and effective framework that explores the Dynamic reasoning Trajectory for Sequential Recommendation along both direction and depth. To guide the direction, we develop Hierarchical Process Supervision (HPS), which provides coarse-to-fine supervisory signals to emulate the natural, progressive refinement of human cognitive processes. To optimize the depth, we introduce the Adaptive Reasoning Halting (ARH) mechanism that dynamically adjusts the number of reasoning steps by jointly monitoring three indicators. Extensive experiments on three real-world datasets demonstrate the superiority of our approach, achieving up to a 24.5% performance improvement over strong baselines while simultaneously reducing computational cost by up to 41.6%.

DTRec: Learning Dynamic Reasoning Trajectories for Sequential Recommendation

TL;DR

DTRec tackles the rigidity of static reasoning trajectories in reasoning-enhanced sequential recommendation by learning dynamic reasoning trajectories along direction and depth. It introduces Hierarchical Process Supervision to guide coarse-to-fine reasoning and Adaptive Reasoning Halting to allocate computation adaptively. The method demonstrates strong empirical gains on three real-world datasets, with notable improvements in accuracy and reductions in computational cost. The work provides insights into interpretable reasoning trajectories and adaptive-depth inference for scalable, reasoning-enhanced recommender systems.

Abstract

Inspired by advances in LLMs, reasoning-enhanced sequential recommendation performs multi-step deliberation before making final predictions, unlocking greater potential for capturing user preferences. However, current methods are constrained by static reasoning trajectories that are ill-suited for the diverse complexity of user behaviors. They suffer from two key limitations: (1) a static reasoning direction, which uses flat supervision signals misaligned with human-like hierarchical reasoning, and (2) a fixed reasoning depth, which inefficiently applies the same computational effort to all users, regardless of pattern complexity. These rigidity lead to suboptimal performance and significant computational waste. To overcome these challenges, we propose DTRec, a novel and effective framework that explores the Dynamic reasoning Trajectory for Sequential Recommendation along both direction and depth. To guide the direction, we develop Hierarchical Process Supervision (HPS), which provides coarse-to-fine supervisory signals to emulate the natural, progressive refinement of human cognitive processes. To optimize the depth, we introduce the Adaptive Reasoning Halting (ARH) mechanism that dynamically adjusts the number of reasoning steps by jointly monitoring three indicators. Extensive experiments on three real-world datasets demonstrate the superiority of our approach, achieving up to a 24.5% performance improvement over strong baselines while simultaneously reducing computational cost by up to 41.6%.

Paper Structure

This paper contains 22 sections, 10 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1:
  • Figure 2: Visualization of Reasoning Trajectory
  • Figure 3: Average reasoning steps for users of different levels of interactive length. G1 denotes the group of users with the lowest average number of interactions.