Table of Contents
Fetching ...

Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding

Yunkai Zhang, Qiang Zhang, Feng Lin, Ruizhong Qiu, Hanchao Yu, Jiayi Liu, Yinglong Xia, Zhuoran Yu, Zeyu Zheng, Diji Yang

TL;DR

This work addresses the misalignment between end-to-end generative recommenders and real-world business objectives by injecting structured human priors directly into model training through lightweight, prior-conditioned adapter heads. A backbone-agnostic encode-then-project framework enables multi-faceted user intent to be disentangled along interpretable axes, with a hierarchical composition strategy to model interactions across priors. Empirical results across three large-scale datasets show gains in standard accuracy and beyond-accuracy metrics, including diversity, exploration, and personalization, while enabling better utilization of longer context and larger models. The approach provides a practical bridge between legacy domain knowledge and modern generative recommender systems, with scalable implementation and clear interpretability through head-specific compatibilities. Future work suggests formalizing prior selection and extending dynamic, context-aware fusion of priors for even stronger alignment with business objectives.

Abstract

Optimizing recommender systems for objectives beyond accuracy, such as diversity, novelty, and personalization, is crucial for long-term user satisfaction. To this end, industrial practitioners have accumulated vast amounts of structured domain knowledge, which we term human priors (e.g., item taxonomies, temporal patterns). This knowledge is typically applied through post-hoc adjustments during ranking or post-ranking. However, this approach remains decoupled from the core model learning, which is particularly undesirable as the industry shifts to end-to-end generative recommendation foundation models. On the other hand, many methods targeting these beyond-accuracy objectives often require architecture-specific modifications and discard these valuable human priors by learning user intent in a fully unsupervised manner. Instead of discarding the human priors accumulated over years of practice, we introduce a backbone-agnostic framework that seamlessly integrates these human priors directly into the end-to-end training of generative recommenders. With lightweight, prior-conditioned adapter heads inspired by efficient LLM decoding strategies, our approach guides the model to disentangle user intent along human-understandable axes (e.g., interaction types, long- vs. short-term interests). We also introduce a hierarchical composition strategy for modeling complex interactions across different prior types. Extensive experiments on three large-scale datasets demonstrate that our method significantly enhances both accuracy and beyond-accuracy objectives. We also show that human priors allow the backbone model to more effectively leverage longer context lengths and larger model sizes.

Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding

TL;DR

This work addresses the misalignment between end-to-end generative recommenders and real-world business objectives by injecting structured human priors directly into model training through lightweight, prior-conditioned adapter heads. A backbone-agnostic encode-then-project framework enables multi-faceted user intent to be disentangled along interpretable axes, with a hierarchical composition strategy to model interactions across priors. Empirical results across three large-scale datasets show gains in standard accuracy and beyond-accuracy metrics, including diversity, exploration, and personalization, while enabling better utilization of longer context and larger models. The approach provides a practical bridge between legacy domain knowledge and modern generative recommender systems, with scalable implementation and clear interpretability through head-specific compatibilities. Future work suggests formalizing prior selection and extending dynamic, context-aware fusion of priors for even stronger alignment with business objectives.

Abstract

Optimizing recommender systems for objectives beyond accuracy, such as diversity, novelty, and personalization, is crucial for long-term user satisfaction. To this end, industrial practitioners have accumulated vast amounts of structured domain knowledge, which we term human priors (e.g., item taxonomies, temporal patterns). This knowledge is typically applied through post-hoc adjustments during ranking or post-ranking. However, this approach remains decoupled from the core model learning, which is particularly undesirable as the industry shifts to end-to-end generative recommendation foundation models. On the other hand, many methods targeting these beyond-accuracy objectives often require architecture-specific modifications and discard these valuable human priors by learning user intent in a fully unsupervised manner. Instead of discarding the human priors accumulated over years of practice, we introduce a backbone-agnostic framework that seamlessly integrates these human priors directly into the end-to-end training of generative recommenders. With lightweight, prior-conditioned adapter heads inspired by efficient LLM decoding strategies, our approach guides the model to disentangle user intent along human-understandable axes (e.g., interaction types, long- vs. short-term interests). We also introduce a hierarchical composition strategy for modeling complex interactions across different prior types. Extensive experiments on three large-scale datasets demonstrate that our method significantly enhances both accuracy and beyond-accuracy objectives. We also show that human priors allow the backbone model to more effectively leverage longer context lengths and larger model sizes.

Paper Structure

This paper contains 41 sections, 10 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Evolution of entropy as training progresses on the validation set. Here HSTU is the backbone model.
  • Figure 2: User Prior leads to more personalized recommendations, especially on the minority user groups.
  • Figure 3: Scaling by context lengths and sizes for HSTU.
  • Figure 4: Visualization of the representation space for a particular user on Pixel8M. We plot item history, target items, and the top recommended items. The items recommended by different heads or interests are represented using different colors.
  • Figure 5: Comparison of composition strategies on Pixel8M across different HSTU model sizes.
  • ...and 3 more figures