Table of Contents
Fetching ...

VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation

Jinfeng Xu, Zheyu Chen, Shuo Yang, Jinze Li, Zitong Wan, Hewei Wang, Weijie Liu, Yijie Li, Edith C. H. Ngai

TL;DR

This work addresses data sparsity in multimodal recommendations by introducing VI-MMRec, a plug-and-play framework that augments sparse user-item interactions with similarity-aware virtual interactions derived from modality-specific item features. It offers two construction strategies, Overlay and Synergistic, paired with a statistically guided weighting scheme to balance real and virtual signals, while preserving training efficiency by freezing modality embeddings. Across six real-world datasets and seven strong baselines, VI-MMRec yields consistent, significant improvements, with especially strong gains when more modalities are available (e.g., TikTok). The results demonstrate the practicality and robustness of similarity-driven augmentation for multimodal recommender systems, enabling broader deployment with minimal engineering effort.

Abstract

Although existing multimodal recommendation models have shown promising performance, their effectiveness continues to be limited by the pervasive data sparsity problem. This problem arises because users typically interact with only a small subset of available items, leading existing models to arbitrarily treat unobserved items as negative samples. To this end, we propose VI-MMRec, a model-agnostic and training cost-free framework that enriches sparse user-item interactions via similarity-aware virtual user-item interactions. These virtual interactions are constructed based on modality-specific feature similarities of user-interacted items. Specifically, VI-MMRec introduces two different strategies: (1) Overlay, which independently aggregates modality-specific similarities to preserve modality-specific user preferences, and (2) Synergistic, which holistically fuses cross-modal similarities to capture complementary user preferences. To ensure high-quality augmentation, we design a statistically informed weight allocation mechanism that adaptively assigns weights to virtual user-item interactions based on dataset-specific modality relevance. As a plug-and-play framework, VI-MMRec seamlessly integrates with existing models to enhance their performance without modifying their core architecture. Its flexibility allows it to be easily incorporated into various existing models, maximizing performance with minimal implementation effort. Moreover, VI-MMRec introduces no additional overhead during training, making it significantly advantageous for practical deployment. Comprehensive experiments conducted on six real-world datasets using seven state-of-the-art multimodal recommendation models validate the effectiveness of our VI-MMRec.

VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation

TL;DR

This work addresses data sparsity in multimodal recommendations by introducing VI-MMRec, a plug-and-play framework that augments sparse user-item interactions with similarity-aware virtual interactions derived from modality-specific item features. It offers two construction strategies, Overlay and Synergistic, paired with a statistically guided weighting scheme to balance real and virtual signals, while preserving training efficiency by freezing modality embeddings. Across six real-world datasets and seven strong baselines, VI-MMRec yields consistent, significant improvements, with especially strong gains when more modalities are available (e.g., TikTok). The results demonstrate the practicality and robustness of similarity-driven augmentation for multimodal recommender systems, enabling broader deployment with minimal engineering effort.

Abstract

Although existing multimodal recommendation models have shown promising performance, their effectiveness continues to be limited by the pervasive data sparsity problem. This problem arises because users typically interact with only a small subset of available items, leading existing models to arbitrarily treat unobserved items as negative samples. To this end, we propose VI-MMRec, a model-agnostic and training cost-free framework that enriches sparse user-item interactions via similarity-aware virtual user-item interactions. These virtual interactions are constructed based on modality-specific feature similarities of user-interacted items. Specifically, VI-MMRec introduces two different strategies: (1) Overlay, which independently aggregates modality-specific similarities to preserve modality-specific user preferences, and (2) Synergistic, which holistically fuses cross-modal similarities to capture complementary user preferences. To ensure high-quality augmentation, we design a statistically informed weight allocation mechanism that adaptively assigns weights to virtual user-item interactions based on dataset-specific modality relevance. As a plug-and-play framework, VI-MMRec seamlessly integrates with existing models to enhance their performance without modifying their core architecture. Its flexibility allows it to be easily incorporated into various existing models, maximizing performance with minimal implementation effort. Moreover, VI-MMRec introduces no additional overhead during training, making it significantly advantageous for practical deployment. Comprehensive experiments conducted on six real-world datasets using seven state-of-the-art multimodal recommendation models validate the effectiveness of our VI-MMRec.

Paper Structure

This paper contains 29 sections, 13 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: The overall framework of our VI-MMRec.
  • Figure 2: Performance comparison for VI-MMRec and all variants on six multimodal recommendation models across all six datasets regarding Recall@10.
  • Figure 3: Sparsity degree analysis on the Baby dataset in terms of Recall@10.
  • Figure 4: Performance comparison $w.r.t.$ key hyper-parameters ($k$ and $\lambda$) for DRAGON and DiffMM models across all datasets.