Table of Contents
Fetching ...

MVIGER: Multi-View Variational Integration of Complementary Knowledge for Generative Recommender

Tongyoung Kim, Soojin Yoon, Seongku Kang, Jinyoung Yeo, Dongha Lee

TL;DR

This work examines how different LM prompt templates and item index types induce complementary knowledge in generative recommender systems, causing inconsistent outputs for the same user history. It introduces MVIGER, a unified variational framework that represents each template-index combination as a distinct view and uses a learnable prior over a categorical latent variable to adaptively select or fuse views during training and inference. By generating heterogeneous item indices via a two-source embedding (collaborative and semantic) and encoding them with residual quantization, MVIGER achieves robust multi-view integration through an ELBO-based objective that employs a tempered posterior. Across three real-world datasets, MVIGER consistently outperforms baselines, providing more stable, accurate recommendations and demonstrating the practical value of probabilistic multi-view integration for generative sequential recommendation. The approach offers flexible inference strategies that balance speed and accuracy, making it suitable for scalable deployment in real systems, and highlights the importance of leveraging complementary sources of knowledge in LM-based recommender architectures.

Abstract

Language Models (LMs) have been widely used in recommender systems to incorporate textual information of items into item IDs, leveraging their advanced language understanding and generation capabilities. Recently, generative recommender systems have utilized the reasoning abilities of LMs to directly generate index tokens for potential items of interest based on the user's interaction history. To inject diverse item knowledge into LMs, prompt templates with detailed task descriptions and various indexing techniques derived from diverse item information have been explored. This paper focuses on the inconsistency in outputs generated by variations in input prompt templates and item index types, even with the same user's interaction history. Our in-depth quantitative analysis reveals that preference knowledge learned from diverse prompt templates and heterogeneous indices differs significantly, indicating a high potential for complementarity. To fully exploit this complementarity and provide consistent performance under varying prompts and item indices, we propose MVIGER, a unified variational framework that models selection among these information sources as a categorical latent variable with a learnable prior. During inference, this prior enables the model to adaptively select the most relevant source or aggregate predictions across multiple sources, thereby ensuring high-quality recommendation across diverse template-index combinations. We validate the effectiveness of MVIGER on three real-world datasets, demonstrating its superior performance over existing generative recommender baselines through the effective integration of complementary knowledge.

MVIGER: Multi-View Variational Integration of Complementary Knowledge for Generative Recommender

TL;DR

This work examines how different LM prompt templates and item index types induce complementary knowledge in generative recommender systems, causing inconsistent outputs for the same user history. It introduces MVIGER, a unified variational framework that represents each template-index combination as a distinct view and uses a learnable prior over a categorical latent variable to adaptively select or fuse views during training and inference. By generating heterogeneous item indices via a two-source embedding (collaborative and semantic) and encoding them with residual quantization, MVIGER achieves robust multi-view integration through an ELBO-based objective that employs a tempered posterior. Across three real-world datasets, MVIGER consistently outperforms baselines, providing more stable, accurate recommendations and demonstrating the practical value of probabilistic multi-view integration for generative sequential recommendation. The approach offers flexible inference strategies that balance speed and accuracy, making it suitable for scalable deployment in real systems, and highlights the importance of leveraging complementary sources of knowledge in LM-based recommender architectures.

Abstract

Language Models (LMs) have been widely used in recommender systems to incorporate textual information of items into item IDs, leveraging their advanced language understanding and generation capabilities. Recently, generative recommender systems have utilized the reasoning abilities of LMs to directly generate index tokens for potential items of interest based on the user's interaction history. To inject diverse item knowledge into LMs, prompt templates with detailed task descriptions and various indexing techniques derived from diverse item information have been explored. This paper focuses on the inconsistency in outputs generated by variations in input prompt templates and item index types, even with the same user's interaction history. Our in-depth quantitative analysis reveals that preference knowledge learned from diverse prompt templates and heterogeneous indices differs significantly, indicating a high potential for complementarity. To fully exploit this complementarity and provide consistent performance under varying prompts and item indices, we propose MVIGER, a unified variational framework that models selection among these information sources as a categorical latent variable with a learnable prior. During inference, this prior enables the model to adaptively select the most relevant source or aggregate predictions across multiple sources, thereby ensuring high-quality recommendation across diverse template-index combinations. We validate the effectiveness of MVIGER on three real-world datasets, demonstrating its superior performance over existing generative recommender baselines through the effective integration of complementary knowledge.
Paper Structure (33 sections, 11 equations, 3 figures, 7 tables)

This paper contains 33 sections, 11 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Inconsistent predictions for the same user's history due to variations in prompt templates and item index, motivating the need for integrating complementary knowledge.
  • Figure 2: PER (%) of CeID-CeID and SeID-SeID results from 10 different prompt templates in Amazon Sports.
  • Figure 3: Overview of the MVIGER framework. For each user, heterogeneous item indices (e.g., CeID and SeID) are first constructed based on the user’s interaction history, resulting in multiple latent views with templates. MVIGER then jointly trains the sequential recommender with a variational prior distribution over these latent views, capturing their relative importance for each user and guiding the integration of information from diverse prompt-template and index combinations. At inference, the learned prior adaptively selects or aggregates information from the candidate views, integrating complementary knowledge to generate a final ranked list that is robust and consistent across different template-index settings.