Table of Contents
Fetching ...

Training-free Graph-based Imputation of Missing Modalities in Multimodal Recommendation

Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Tommaso Di Noia, Fragkiskos D. Malliaros

TL;DR

The paper addresses missing modalities in multimodal recommender systems by formalizing the problem and reframing it as graph feature interpolation on the item-item co-purchase graph. It introduces four training-free graph-aware imputations—NeighMean, MultiHop, PersPageRank, and Heat diffusion—to propagate available multimodal features across the graph, enabling pre-processing imputation before model training. Extensive experiments on six Amazon datasets and MicroLens across multiple baseline models show that graph-based imputations largely preserve or widen the gap between traditional and multimodal RSs and often outperform traditional and autoencoder-based imputations, with performance sensitive to hyperparameters like TopN and hop count. The work demonstrates practical effectiveness, provides public code, and identifies future directions for robust, end-to-end integration and handling of cold-start and noisy data in multimodal recommendation.

Abstract

Multimodal recommender systems (RSs) represent items in the catalog through multimodal data (e.g., product images and descriptions) that, in some cases, might be noisy or (even worse) missing. In those scenarios, the common practice is to drop items with missing modalities and train the multimodal RSs on a subsample of the original dataset. To date, the problem of missing modalities in multimodal recommendation has still received limited attention in the literature, lacking a precise formalisation as done with missing information in traditional machine learning. In this work, we first provide a problem formalisation for missing modalities in multimodal recommendation. Second, by leveraging the user-item graph structure, we re-cast the problem of missing multimodal information as a problem of graph features interpolation on the item-item co-purchase graph. On this basis, we propose four training-free approaches that propagate the available multimodal features throughout the item-item graph to impute the missing features. Extensive experiments on popular multimodal recommendation datasets demonstrate that our solutions can be seamlessly plugged into any existing multimodal RS and benchmarking framework while still preserving (or even widen) the performance gap between multimodal and traditional RSs. Moreover, we show that our graph-based techniques can perform better than traditional imputations in machine learning under different missing modalities settings. Finally, we analyse (for the first time in multimodal RSs) how feature homophily calculated on the item-item graph can influence our graph-based imputations.

Training-free Graph-based Imputation of Missing Modalities in Multimodal Recommendation

TL;DR

The paper addresses missing modalities in multimodal recommender systems by formalizing the problem and reframing it as graph feature interpolation on the item-item co-purchase graph. It introduces four training-free graph-aware imputations—NeighMean, MultiHop, PersPageRank, and Heat diffusion—to propagate available multimodal features across the graph, enabling pre-processing imputation before model training. Extensive experiments on six Amazon datasets and MicroLens across multiple baseline models show that graph-based imputations largely preserve or widen the gap between traditional and multimodal RSs and often outperform traditional and autoencoder-based imputations, with performance sensitive to hyperparameters like TopN and hop count. The work demonstrates practical effectiveness, provides public code, and identifies future directions for robust, end-to-end integration and handling of cold-start and noisy data in multimodal recommendation.

Abstract

Multimodal recommender systems (RSs) represent items in the catalog through multimodal data (e.g., product images and descriptions) that, in some cases, might be noisy or (even worse) missing. In those scenarios, the common practice is to drop items with missing modalities and train the multimodal RSs on a subsample of the original dataset. To date, the problem of missing modalities in multimodal recommendation has still received limited attention in the literature, lacking a precise formalisation as done with missing information in traditional machine learning. In this work, we first provide a problem formalisation for missing modalities in multimodal recommendation. Second, by leveraging the user-item graph structure, we re-cast the problem of missing multimodal information as a problem of graph features interpolation on the item-item co-purchase graph. On this basis, we propose four training-free approaches that propagate the available multimodal features throughout the item-item graph to impute the missing features. Extensive experiments on popular multimodal recommendation datasets demonstrate that our solutions can be seamlessly plugged into any existing multimodal RS and benchmarking framework while still preserving (or even widen) the performance gap between multimodal and traditional RSs. Moreover, we show that our graph-based techniques can perform better than traditional imputations in machine learning under different missing modalities settings. Finally, we analyse (for the first time in multimodal RSs) how feature homophily calculated on the item-item graph can influence our graph-based imputations.
Paper Structure (29 sections, 14 equations, 4 figures, 6 tables)

This paper contains 29 sections, 14 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Motivating example where we show the performance improvement (Recall@20) between NGCF (blue) and NGCF-M (brown) in the dropped and imputed settings, for the Music and Beauty datasets. As is evident, the performance improvement of NGCF-M over NGCF becomes wider (or even reverts on the Music dataset) with the imputed setting.
  • Figure 2: Visual representation of our graph-based imputations. Starting from the user-item graph with incomplete multimodal features (a), we project and sparsify it to obtain the item-item co-purchase graph (b). Then, we propagate the available multimodal features in the obtained item-item graph (c), which allows us to impute the missing features (d).
  • Figure 3: Performance variation for (a) VBPR and (b) MGCN with the Heat imputation method by considering different sparsification rates and propagation hops on the Sports dataset.
  • Figure 4: Performance variation (Recall) on Sports for (a) VBPR and (b) MGCN and with Heat imputation. Specifically, we display how the Recall changes with different levels of feature homophily ($x$-axis) and item-item sparsification ($y$-axis).