Table of Contents
Fetching ...

I$^3$-MRec: Invariant Learning with Information Bottleneck for Incomplete Modality Recommendation

Huilin Chen, Miaomiao Cai, Fan Liu, Zhiyong Cheng, Richang Hong, Meng Wang

TL;DR

This work tackles the robustness challenge of multimodal recommender systems when modality data are incomplete. It introduces I^3-MRec, a principled framework that combines Invariant Risk Minimization (IRM) to learn cross-modality invariant user-item representations with an Information Bottleneck (IB) guided missing-aware fusion to produce compact yet effective multimodal representations. The method explicitly simulates modality-missing scenarios during training and optimizes a joint objective that preserves task-relevant information while reducing reliance on raw modality content. Across three real-world datasets, I^3-MRec consistently outperforms state-of-the-art baselines under both full-modality and missing-modality settings, demonstrating strong robustness and generalization with practical impact for real-world deployment.

Abstract

Multimodal recommender systems (MRS) improve recommendation performance by integrating complementary semantic information from multiple modalities. However, the assumption of complete multimodality rarely holds in practice due to missing images and incomplete descriptions, hindering model robustness and generalization. To address these challenges, we introduce a novel method called \textbf{I$^3$-MRec}, which uses \textbf{I}nvariant learning with \textbf{I}nformation bottleneck principle for \textbf{I}ncomplete \textbf{M}odality \textbf{Rec}ommendation. To achieve robust performance in missing modality scenarios, I$^3$-MRec enforces two pivotal properties: (i) cross-modal preference invariance, ensuring consistent user preference modeling across varying modality environments, and (ii) compact yet effective multimodal representation, as modality information becomes unreliable in such scenarios, reducing the dependence on modality-specific information is particularly important. By treating each modality as a distinct semantic environment, I$^3$-MRec employs invariant risk minimization (IRM) to learn preference-oriented representations. In parallel, a missing-aware fusion module is developed to explicitly simulate modality-missing scenarios. Built upon the Information Bottleneck (IB) principle, the module aims to preserve essential user preference signals across these scenarios while effectively compressing modality-specific information. Extensive experiments conducted on three real-world datasets demonstrate that I$^3$-MRec consistently outperforms existing state-of-the-art MRS methods across various modality-missing scenarios, highlighting its effectiveness and robustness in practical applications.

I$^3$-MRec: Invariant Learning with Information Bottleneck for Incomplete Modality Recommendation

TL;DR

This work tackles the robustness challenge of multimodal recommender systems when modality data are incomplete. It introduces I^3-MRec, a principled framework that combines Invariant Risk Minimization (IRM) to learn cross-modality invariant user-item representations with an Information Bottleneck (IB) guided missing-aware fusion to produce compact yet effective multimodal representations. The method explicitly simulates modality-missing scenarios during training and optimizes a joint objective that preserves task-relevant information while reducing reliance on raw modality content. Across three real-world datasets, I^3-MRec consistently outperforms state-of-the-art baselines under both full-modality and missing-modality settings, demonstrating strong robustness and generalization with practical impact for real-world deployment.

Abstract

Multimodal recommender systems (MRS) improve recommendation performance by integrating complementary semantic information from multiple modalities. However, the assumption of complete multimodality rarely holds in practice due to missing images and incomplete descriptions, hindering model robustness and generalization. To address these challenges, we introduce a novel method called \textbf{I-MRec}, which uses \textbf{I}nvariant learning with \textbf{I}nformation bottleneck principle for \textbf{I}ncomplete \textbf{M}odality \textbf{Rec}ommendation. To achieve robust performance in missing modality scenarios, I-MRec enforces two pivotal properties: (i) cross-modal preference invariance, ensuring consistent user preference modeling across varying modality environments, and (ii) compact yet effective multimodal representation, as modality information becomes unreliable in such scenarios, reducing the dependence on modality-specific information is particularly important. By treating each modality as a distinct semantic environment, I-MRec employs invariant risk minimization (IRM) to learn preference-oriented representations. In parallel, a missing-aware fusion module is developed to explicitly simulate modality-missing scenarios. Built upon the Information Bottleneck (IB) principle, the module aims to preserve essential user preference signals across these scenarios while effectively compressing modality-specific information. Extensive experiments conducted on three real-world datasets demonstrate that I-MRec consistently outperforms existing state-of-the-art MRS methods across various modality-missing scenarios, highlighting its effectiveness and robustness in practical applications.

Paper Structure

This paper contains 29 sections, 15 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Performance of MRS methods on the Baby dataset under two settings. “Full Modality” indicates no missing modality during training and testing. “Missing Modality” follows the MTMT setup (Section 4.3), with random modality missingness in both phases.
  • Figure 2: Overview of the proposed I$^3$-MRec framework. The model first learns preference-oriented user/item representations using a graph-based approach, guided by IRM to ensure invariance across modalities. Then, an IB-based fusion module generates compact yet effective representations by maximizing preference-relevant information while compressing modality-specific information.
  • Figure 3: Performance on different types of missing modality on the Amazon Baby dataset. The “Full” setting indicates that the baseline models were trained with complete modality scenarios.
  • Figure 4: (a) Recall@20 under varying missing modality rates on the Amazon Baby dataset. (b) Normalized performance with respect to the full-modality setting.