Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction
Hongtao Huang, Chengkai Huang, Junda Wu, Tong Yu, Julian McAuley, Lina Yao
TL;DR
This work introduces User Behavior Trajectory Prediction (UBTP), moving beyond next-item forecasting to generate coherent multi-step action sequences. It proposes Listwise Preference Diffusion Optimization (LPDO), a diffusion-based framework that integrates a Plackett–Luce listwise ranking signal into the diffusion ELBO to capture global item dependencies across a trajectory. A principled derivation yields a tight ELBO that couples reconstruction fidelity with listwise ranking, and a new SeqMatch metric evaluates trajectory-level agreement. Empirical results on three real-world benchmarks show LPDO achieving state-of-the-art performance in both accuracy and sequence coherence, establishing a new benchmark for structured preference learning with diffusion models in sequential recommendation.
Abstract
Forecasting multi-step user behavior trajectories requires reasoning over structured preferences across future actions, a challenge overlooked by traditional sequential recommendation. This problem is critical for applications such as personalized commerce and adaptive content delivery, where anticipating a user's complete action sequence enhances both satisfaction and business outcomes. We identify an essential limitation of existing paradigms: their inability to capture global, listwise dependencies among sequence items. To address this, we formulate User Behavior Trajectory Prediction (UBTP) as a new task setting that explicitly models long-term user preferences. We introduce Listwise Preference Diffusion Optimization (LPDO), a diffusion-based training framework that directly optimizes structured preferences over entire item sequences. LPDO incorporates a Plackett-Luce supervision signal and derives a tight variational lower bound aligned with listwise ranking likelihoods, enabling coherent preference generation across denoising steps and overcoming the independent-token assumption of prior diffusion methods. To rigorously evaluate multi-step prediction quality, we propose the task-specific metric Sequential Match (SeqMatch), which measures exact trajectory agreement, and adopt Perplexity (PPL), which assesses probabilistic fidelity. Extensive experiments on real-world user behavior benchmarks demonstrate that LPDO consistently outperforms state-of-the-art baselines, establishing a new benchmark for structured preference learning with diffusion models.
