Table of Contents
Fetching ...

Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction

Hongtao Huang, Chengkai Huang, Junda Wu, Tong Yu, Julian McAuley, Lina Yao

TL;DR

This work introduces User Behavior Trajectory Prediction (UBTP), moving beyond next-item forecasting to generate coherent multi-step action sequences. It proposes Listwise Preference Diffusion Optimization (LPDO), a diffusion-based framework that integrates a Plackett–Luce listwise ranking signal into the diffusion ELBO to capture global item dependencies across a trajectory. A principled derivation yields a tight ELBO that couples reconstruction fidelity with listwise ranking, and a new SeqMatch metric evaluates trajectory-level agreement. Empirical results on three real-world benchmarks show LPDO achieving state-of-the-art performance in both accuracy and sequence coherence, establishing a new benchmark for structured preference learning with diffusion models in sequential recommendation.

Abstract

Forecasting multi-step user behavior trajectories requires reasoning over structured preferences across future actions, a challenge overlooked by traditional sequential recommendation. This problem is critical for applications such as personalized commerce and adaptive content delivery, where anticipating a user's complete action sequence enhances both satisfaction and business outcomes. We identify an essential limitation of existing paradigms: their inability to capture global, listwise dependencies among sequence items. To address this, we formulate User Behavior Trajectory Prediction (UBTP) as a new task setting that explicitly models long-term user preferences. We introduce Listwise Preference Diffusion Optimization (LPDO), a diffusion-based training framework that directly optimizes structured preferences over entire item sequences. LPDO incorporates a Plackett-Luce supervision signal and derives a tight variational lower bound aligned with listwise ranking likelihoods, enabling coherent preference generation across denoising steps and overcoming the independent-token assumption of prior diffusion methods. To rigorously evaluate multi-step prediction quality, we propose the task-specific metric Sequential Match (SeqMatch), which measures exact trajectory agreement, and adopt Perplexity (PPL), which assesses probabilistic fidelity. Extensive experiments on real-world user behavior benchmarks demonstrate that LPDO consistently outperforms state-of-the-art baselines, establishing a new benchmark for structured preference learning with diffusion models.

Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction

TL;DR

This work introduces User Behavior Trajectory Prediction (UBTP), moving beyond next-item forecasting to generate coherent multi-step action sequences. It proposes Listwise Preference Diffusion Optimization (LPDO), a diffusion-based framework that integrates a Plackett–Luce listwise ranking signal into the diffusion ELBO to capture global item dependencies across a trajectory. A principled derivation yields a tight ELBO that couples reconstruction fidelity with listwise ranking, and a new SeqMatch metric evaluates trajectory-level agreement. Empirical results on three real-world benchmarks show LPDO achieving state-of-the-art performance in both accuracy and sequence coherence, establishing a new benchmark for structured preference learning with diffusion models in sequential recommendation.

Abstract

Forecasting multi-step user behavior trajectories requires reasoning over structured preferences across future actions, a challenge overlooked by traditional sequential recommendation. This problem is critical for applications such as personalized commerce and adaptive content delivery, where anticipating a user's complete action sequence enhances both satisfaction and business outcomes. We identify an essential limitation of existing paradigms: their inability to capture global, listwise dependencies among sequence items. To address this, we formulate User Behavior Trajectory Prediction (UBTP) as a new task setting that explicitly models long-term user preferences. We introduce Listwise Preference Diffusion Optimization (LPDO), a diffusion-based training framework that directly optimizes structured preferences over entire item sequences. LPDO incorporates a Plackett-Luce supervision signal and derives a tight variational lower bound aligned with listwise ranking likelihoods, enabling coherent preference generation across denoising steps and overcoming the independent-token assumption of prior diffusion methods. To rigorously evaluate multi-step prediction quality, we propose the task-specific metric Sequential Match (SeqMatch), which measures exact trajectory agreement, and adopt Perplexity (PPL), which assesses probabilistic fidelity. Extensive experiments on real-world user behavior benchmarks demonstrate that LPDO consistently outperforms state-of-the-art baselines, establishing a new benchmark for structured preference learning with diffusion models.

Paper Structure

This paper contains 31 sections, 33 equations, 6 figures, 11 tables, 2 algorithms.

Figures (6)

  • Figure 1: Examples of user trajectory predictions and a comparison of optimization strategies. The circles represent a series of theme-related movies (e.g., Harry Potter film series), and the cross indicate unrelated movies that user does not prefer; the dash line denotes the predicted movie list that user might be interested in; the left color bar shows spatial probability. (a) Non-DM model (e.g., SASRec SARS) is typically deterministic and predicts a fixed trajectory, which often fails to capture users’ latent preferences. (b) Traditional diffusion model captures a preference distribution to produce a more robust trajectory, but the distribution may overlap with unrelated targets. (c) Preference-aware diffusion model incorporates user preference into the sampling process, concentrating the trajectory distribution on related targets and yielding more coherent recommendation lists. (d) Comparison of optimization objectives: position-wise preference optimization (top) independently maximizes each position’s likelihood and ignores inter‑item dependencies; list-wise preference optimization (bottom) maximizes the joint likelihood of the entire ordered list, better capturing ordering and dependencies among items and producing more consistent, accurate behavior trajectories.
  • Figure 2: Illustration of LPDO.
  • Figure 3: Ablation and analysis of LPDO on ML-1M (len=5) dataset. (a) Training process comparison between LPDO and DCRec, showing faster convergence and higher SeqMatch@50 for LPDO. (b) Position-wise comparison of HR@5 across different models, where LPDO consistently outperforms SASRec and DiffuRec at each position of the predicted trajectory. (c) Impact of penalty factor $\gamma$ of $\mathcal{L}_{\text{Total}}$. (d) Effect of loss ratio $\lambda$, indicating the best performance is achieved at moderate values of $\lambda$.
  • Figure 4: Illustration of the proposed SeqMatch@N metric. The sequence length is 4 and N = 3. The icons were generated using OpenAI's ChatGPT. These icons are solely for illustrative purposes.
  • Figure 5: Illustration of preference-aware and non-preference prediction on MovieLens-1M dataset. Due to copyright considerations, we do not show the original movie posters.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1: User Interaction History
  • Definition 2: Multi-step Top-$K$ Sequence Prediction