Table of Contents
Fetching ...

MoToRec: Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation

Jialin Liu, Zhaorui Zhang, Ray C. C. Cheung

TL;DR

This work addresses item cold-start in GNN-based recommender systems by moving from continuous feature fusion to discrete semantic tokenization. It introduces MoToRec, which leverages a sparsely-regularized Residual Quantized Variational Autoencoder (RQ-VAE) to produce compositional, interpretable tokens, augmented by adaptive rarity amplification and a hierarchical multi-source graph encoder for robust fusion. Key contributions include the sparsity-driven disentanglement of token vocabularies, rarity-aware learning to focus on scarce items, and a modular fusion scheme that aligns content-based and collaborative signals, all trained with ranking and contrastive objectives. Empirical results on three large Amazon datasets show state-of-the-art performance, with pronounced gains in cold-start scenarios, underscoring the practicality and scalability of discrete tokenization for multimodal recommendation.

Abstract

Graph neural networks (GNNs) have revolutionized recommender systems by effectively modeling complex user-item interactions, yet data sparsity and the item cold-start problem significantly impair performance, particularly for new items with limited or no interaction history. While multimodal content offers a promising solution, existing methods result in suboptimal representations for new items due to noise and entanglement in sparse data. To address this, we transform multimodal recommendation into discrete semantic tokenization. We present Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation (MoToRec), a framework centered on a sparsely-regularized Residual Quantized Variational Autoencoder (RQ-VAE) that generates a compositional semantic code of discrete, interpretable tokens, promoting disentangled representations. MoToRec's architecture is enhanced by three synergistic components: (1) a sparsely-regularized RQ-VAE that promotes disentangled representations, (2) a novel adaptive rarity amplification that promotes prioritized learning for cold-start items, and (3) a hierarchical multi-source graph encoder for robust signal fusion with collaborative signals. Extensive experiments on three large-scale datasets demonstrate MoToRec's superiority over state-of-the-art methods in both overall and cold-start scenarios. Our work validates that discrete tokenization provides an effective and scalable alternative for mitigating the long-standing cold-start challenge.

MoToRec: Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation

TL;DR

This work addresses item cold-start in GNN-based recommender systems by moving from continuous feature fusion to discrete semantic tokenization. It introduces MoToRec, which leverages a sparsely-regularized Residual Quantized Variational Autoencoder (RQ-VAE) to produce compositional, interpretable tokens, augmented by adaptive rarity amplification and a hierarchical multi-source graph encoder for robust fusion. Key contributions include the sparsity-driven disentanglement of token vocabularies, rarity-aware learning to focus on scarce items, and a modular fusion scheme that aligns content-based and collaborative signals, all trained with ranking and contrastive objectives. Empirical results on three large Amazon datasets show state-of-the-art performance, with pronounced gains in cold-start scenarios, underscoring the practicality and scalability of discrete tokenization for multimodal recommendation.

Abstract

Graph neural networks (GNNs) have revolutionized recommender systems by effectively modeling complex user-item interactions, yet data sparsity and the item cold-start problem significantly impair performance, particularly for new items with limited or no interaction history. While multimodal content offers a promising solution, existing methods result in suboptimal representations for new items due to noise and entanglement in sparse data. To address this, we transform multimodal recommendation into discrete semantic tokenization. We present Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation (MoToRec), a framework centered on a sparsely-regularized Residual Quantized Variational Autoencoder (RQ-VAE) that generates a compositional semantic code of discrete, interpretable tokens, promoting disentangled representations. MoToRec's architecture is enhanced by three synergistic components: (1) a sparsely-regularized RQ-VAE that promotes disentangled representations, (2) a novel adaptive rarity amplification that promotes prioritized learning for cold-start items, and (3) a hierarchical multi-source graph encoder for robust signal fusion with collaborative signals. Extensive experiments on three large-scale datasets demonstrate MoToRec's superiority over state-of-the-art methods in both overall and cold-start scenarios. Our work validates that discrete tokenization provides an effective and scalable alternative for mitigating the long-standing cold-start challenge.
Paper Structure (30 sections, 10 equations, 7 figures, 3 tables)

This paper contains 30 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: From continuous alignment to discrete compositional codes. Top: Existing methods struggle with noisy alignment and uninformative IDs. Bottom: MoToRec generates robust and interpretable codes for effective cold-start recommendation.
  • Figure 2: The overall architecture of MoToRec. It consists of three main stages: (1)a sparsely-regularized multimodal tokenization module that converts raw features into discrete codes using RQ-VAEs; (2) a hierarchical multi-source graph encoding module to learn and fuse preferences; and (3) an optimization module. The optimization is guided by both ranking and self-supervised contrastive losses, and is made rarity-aware through a dynamic weighting scheme.
  • Figure 3: Performance comparison on the cold-start item set.
  • Figure 4: Individual hyperparameter sensitivity analysis for N@20 and R@20 across all datasets for key parameters.
  • Figure 5: Pairwise hyperparameter study on the Sports dataset (N@20).
  • ...and 2 more figures