Table of Contents
Fetching ...

MindRec: A Diffusion-driven Coarse-to-Fine Paradigm for Generative Recommendation

Mengyao Gao, Chongming Gao, Haoyan Liu, Qingpeng Cai, Peng Jiang, Jiajia Chen, Shuai Yuan, Xiangnan He

TL;DR

MindRec addresses the suboptimality of auto-regressive recommendations by emulating human reasoning through a diffusion-driven coarse-to-fine generation framework. It introduces a hierarchical category tree and codebook-based item identifiers, enabling first coarse category generation and then refined item completion via Diffusion Beam Search. Empirical results on three Amazon datasets show a 9.5% average improvement in top-1 accuracy over state-of-the-art baselines, with ablations confirming the value of structured preference learning and the proposed decoding strategy. The approach offers improved accuracy and explainability, and extends diffusion-based generation to discrete, structured recommendation tasks with practical efficiency considerations for beam search. The work suggests a promising shift toward human-inspired reasoning in generative recommender systems and highlights directions for integrating psychology-informed strategies into AI-driven recommendations.

Abstract

Recent advancements in large language model-based recommendation systems often represent items as text or semantic IDs and generate recommendations in an auto-regressive manner. However, due to the left-to-right greedy decoding strategy and the unidirectional logical flow, such methods often fail to produce globally optimal recommendations. In contrast, human reasoning does not follow a rigid left-to-right sequence. Instead, it often begins with keywords or intuitive insights, which are then refined and expanded. Inspired by this fact, we propose MindRec, a diffusion-driven coarse-to-fine generative paradigm that emulates human thought processes. Built upon a diffusion language model, MindRec departs from auto-regressive generation by leveraging a masked diffusion process to reconstruct items in a flexible, non-sequential manner. Particularly, our method first generates key tokens that reflect user preferences, and then expands them into the complete item, enabling adaptive and human-like generation. To further emulate the structured nature of human decision-making, we organize items into a hierarchical category tree. This structure guides the model to first produce the coarse-grained category and then progressively refine its selection through finer-grained subcategories before generating the specific item. To mitigate the local optimum problem inherent in greedy decoding, we design a novel beam search algorithm, Diffusion Beam Search, tailored for our mind-inspired generation paradigm. Experimental results demonstrate that MindRec yields a 9.5\% average improvement in top-1 accuracy over state-of-the-art methods, highlighting its potential to enhance recommendation performance. The implementation is available via https://github.com/Mr-Peach0301/MindRec.

MindRec: A Diffusion-driven Coarse-to-Fine Paradigm for Generative Recommendation

TL;DR

MindRec addresses the suboptimality of auto-regressive recommendations by emulating human reasoning through a diffusion-driven coarse-to-fine generation framework. It introduces a hierarchical category tree and codebook-based item identifiers, enabling first coarse category generation and then refined item completion via Diffusion Beam Search. Empirical results on three Amazon datasets show a 9.5% average improvement in top-1 accuracy over state-of-the-art baselines, with ablations confirming the value of structured preference learning and the proposed decoding strategy. The approach offers improved accuracy and explainability, and extends diffusion-based generation to discrete, structured recommendation tasks with practical efficiency considerations for beam search. The work suggests a promising shift toward human-inspired reasoning in generative recommender systems and highlights directions for integrating psychology-informed strategies into AI-driven recommendations.

Abstract

Recent advancements in large language model-based recommendation systems often represent items as text or semantic IDs and generate recommendations in an auto-regressive manner. However, due to the left-to-right greedy decoding strategy and the unidirectional logical flow, such methods often fail to produce globally optimal recommendations. In contrast, human reasoning does not follow a rigid left-to-right sequence. Instead, it often begins with keywords or intuitive insights, which are then refined and expanded. Inspired by this fact, we propose MindRec, a diffusion-driven coarse-to-fine generative paradigm that emulates human thought processes. Built upon a diffusion language model, MindRec departs from auto-regressive generation by leveraging a masked diffusion process to reconstruct items in a flexible, non-sequential manner. Particularly, our method first generates key tokens that reflect user preferences, and then expands them into the complete item, enabling adaptive and human-like generation. To further emulate the structured nature of human decision-making, we organize items into a hierarchical category tree. This structure guides the model to first produce the coarse-grained category and then progressively refine its selection through finer-grained subcategories before generating the specific item. To mitigate the local optimum problem inherent in greedy decoding, we design a novel beam search algorithm, Diffusion Beam Search, tailored for our mind-inspired generation paradigm. Experimental results demonstrate that MindRec yields a 9.5\% average improvement in top-1 accuracy over state-of-the-art methods, highlighting its potential to enhance recommendation performance. The implementation is available via https://github.com/Mr-Peach0301/MindRec.

Paper Structure

This paper contains 33 sections, 8 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of two decoding paradigms of generative recommendation.
  • Figure 2: An overview of the MindRec framework.
  • Figure 3: Recommendation Results of decoding $n$ tokens per step on the Instruments dataset.
  • Figure 4: Performance with varying $p_{\text{mask}}$ on the Arts dataset.
  • Figure 5: Performance of MindRec with different beam sizes on the Arts dataset.