Table of Contents
Fetching ...

Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios

Jiwan Kim, Hongseok Kang, Sein Kim, Kibum Kim, Chanyoung Park

TL;DR

This work tackles realistic missing-modality challenges in multi-modal recommender systems by introducing DGMRec, a framework that disentangles modality features into general and specific components and generates missing modalities via cross-modal alignment and user preferences. The Disentangling Modality Feature module leverages separate encoders and information-based losses (CLUB and InfoNCE) to preserve modality-specific information while aligning general attributes across modalities. The Missing Modality Generation module reconstructs and synthesizes missing features, periodically refining the item-item graph with generated signals to maintain stable learning, and two alignment mechanisms connect modality signals with collaborative filtering. Empirical results across multiple datasets show that DGMRec consistently outperforms state-of-the-art baselines under missing modality and new-item settings and enables cross-modal retrieval, highlighting its practical applicability for industrial recommendation and retrieval systems.

Abstract

Multi-modal recommender systems (MRSs) have achieved notable success in improving personalization by leveraging diverse modalities such as images, text, and audio. However, two key challenges remain insufficiently addressed: (1) Insufficient consideration of missing modality scenarios and (2) the overlooking of unique characteristics of modality features. These challenges result in significant performance degradation in realistic situations where modalities are missing. To address these issues, we propose Disentangling and Generating Modality Recommender (DGMRec), a novel framework tailored for missing modality scenarios. DGMRec disentangles modality features into general and specific modality features from an information-based perspective, enabling richer representations for recommendation. Building on this, it generates missing modality features by integrating aligned features from other modalities and leveraging user modality preferences. Extensive experiments show that DGMRec consistently outperforms state-of-the-art MRSs in challenging scenarios, including missing modalities and new item settings as well as diverse missing ratios and varying levels of missing modalities. Moreover, DGMRec's generation-based approach enables cross-modal retrieval, a task inapplicable for existing MRSs, highlighting its adaptability and potential for real-world applications. Our code is available at https://github.com/ptkjw1997/DGMRec.

Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios

TL;DR

This work tackles realistic missing-modality challenges in multi-modal recommender systems by introducing DGMRec, a framework that disentangles modality features into general and specific components and generates missing modalities via cross-modal alignment and user preferences. The Disentangling Modality Feature module leverages separate encoders and information-based losses (CLUB and InfoNCE) to preserve modality-specific information while aligning general attributes across modalities. The Missing Modality Generation module reconstructs and synthesizes missing features, periodically refining the item-item graph with generated signals to maintain stable learning, and two alignment mechanisms connect modality signals with collaborative filtering. Empirical results across multiple datasets show that DGMRec consistently outperforms state-of-the-art baselines under missing modality and new-item settings and enables cross-modal retrieval, highlighting its practical applicability for industrial recommendation and retrieval systems.

Abstract

Multi-modal recommender systems (MRSs) have achieved notable success in improving personalization by leveraging diverse modalities such as images, text, and audio. However, two key challenges remain insufficiently addressed: (1) Insufficient consideration of missing modality scenarios and (2) the overlooking of unique characteristics of modality features. These challenges result in significant performance degradation in realistic situations where modalities are missing. To address these issues, we propose Disentangling and Generating Modality Recommender (DGMRec), a novel framework tailored for missing modality scenarios. DGMRec disentangles modality features into general and specific modality features from an information-based perspective, enabling richer representations for recommendation. Building on this, it generates missing modality features by integrating aligned features from other modalities and leveraging user modality preferences. Extensive experiments show that DGMRec consistently outperforms state-of-the-art MRSs in challenging scenarios, including missing modalities and new item settings as well as diverse missing ratios and varying levels of missing modalities. Moreover, DGMRec's generation-based approach enables cross-modal retrieval, a task inapplicable for existing MRSs, highlighting its adaptability and potential for real-world applications. Our code is available at https://github.com/ptkjw1997/DGMRec.

Paper Structure

This paper contains 30 sections, 24 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: (a) Performance drop of recent MRSs when missing modality exists. (b) Difference between the model’s recommendation scores under two conditions: when the modality exists (Original Image/Text) and when the modality is missing (NN-Injected Image/Text). Line plots indicate the performance of LGMRec guo2024lgmrec.
  • Figure 2: Overview of DGMRec framework. It consists of the Disentangling Modality Feature module and the Missing Modality Generation module. In the Missing Modality Generation module, we illustrate the case where an item is associated with the text modality while the image modality is missing.
  • Figure 3: Performance on various missing levels on Amazon Baby and TikTok datasets.
  • Figure 4: (a) Performance on various missing ratios, and (b) relative performance drop on Amazon Baby dataset.
  • Figure 5: (a) Visualization of disentangled modality features and (b) similarity score between the features during training
  • ...and 1 more figures