Table of Contents
Fetching ...

Do We Really Need to Drop Items with Missing Modalities in Multimodal Recommendation?

Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Tommaso Di Noia, Fragkiskos D. Malliaros

TL;DR

Missing modalities are common in multimodal recommendation and simply dropping them discards valuable data. The authors introduce an untrained pre-processing imputation pipeline that combines traditional imputations with graph-aware feature propagation over the item-item co-purchase graph, using $\mathbf{R}^{\mathcal{I}} = \mathbf{R}^\top \mathbf{R}$ and its sparsified form $\overline{\mathbf{R}}^{\mathcal{I}}$ to recover $\mathbf{F}'_{im}$ for missing modalities. The approach includes NeighMean, MultiHop, and PersPageRank imputation strategies, which leverage multi-hop neighborhoods and personalized PageRank-style normalization to enhance missing feature recovery. Empirical results on three Amazon domains show that imputing missing modalities often yields larger improvements for multimodal RSs than dropping items, with graph-aware methods delivering the strongest gains; the work provides an actionable pre-processing step that can be plugged into existing systems. These findings suggest that robust handling of missing modalities can significantly boost real-world multimodal recommendations without retraining end-to-end models.

Abstract

Generally, items with missing modalities are dropped in multimodal recommendation. However, with this work, we question this procedure, highlighting that it would further damage the pipeline of any multimodal recommender system. First, we show that the lack of (some) modalities is, in fact, a widely-diffused phenomenon in multimodal recommendation. Second, we propose a pipeline that imputes missing multimodal features in recommendation by leveraging traditional imputation strategies in machine learning. Then, given the graph structure of the recommendation data, we also propose three more effective imputation solutions that leverage the item-item co-purchase graph and the multimodal similarities of co-interacted items. Our method can be plugged into any multimodal RSs in the literature working as an untrained pre-processing phase, showing (through extensive experiments) that any data pre-filtering is not only unnecessary but also harmful to the performance.

Do We Really Need to Drop Items with Missing Modalities in Multimodal Recommendation?

TL;DR

Missing modalities are common in multimodal recommendation and simply dropping them discards valuable data. The authors introduce an untrained pre-processing imputation pipeline that combines traditional imputations with graph-aware feature propagation over the item-item co-purchase graph, using and its sparsified form to recover for missing modalities. The approach includes NeighMean, MultiHop, and PersPageRank imputation strategies, which leverage multi-hop neighborhoods and personalized PageRank-style normalization to enhance missing feature recovery. Empirical results on three Amazon domains show that imputing missing modalities often yields larger improvements for multimodal RSs than dropping items, with graph-aware methods delivering the strongest gains; the work provides an actionable pre-processing step that can be plugged into existing systems. These findings suggest that robust handling of missing modalities can significantly boost real-world multimodal recommendations without retraining end-to-end models.

Abstract

Generally, items with missing modalities are dropped in multimodal recommendation. However, with this work, we question this procedure, highlighting that it would further damage the pipeline of any multimodal recommender system. First, we show that the lack of (some) modalities is, in fact, a widely-diffused phenomenon in multimodal recommendation. Second, we propose a pipeline that imputes missing multimodal features in recommendation by leveraging traditional imputation strategies in machine learning. Then, given the graph structure of the recommendation data, we also propose three more effective imputation solutions that leverage the item-item co-purchase graph and the multimodal similarities of co-interacted items. Our method can be plugged into any multimodal RSs in the literature working as an untrained pre-processing phase, showing (through extensive experiments) that any data pre-filtering is not only unnecessary but also harmful to the performance.
Paper Structure (8 sections, 1 figure, 4 tables)

This paper contains 8 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Impact of top-$k$ sparsification (top) and propagation hops (bottom) on the Beauty dataset for PersPageRank.