Table of Contents
Fetching ...

DReX: An Explainable Deep Learning-based Multimodal Recommendation Framework

Adamya Shyam, Venkateswara Rao Kagita, Bharti Rana, Vikas Kumar

TL;DR

DReX is proposed, a unified multimodal recommendation framework that incrementally refines user and item representations by leveraging interaction-level features from multimodal feedback and automatically generates interpretable keyword profiles for both users and items, which supplement the recommendation process with interpretable preference indicators.

Abstract

Multimodal recommender systems leverage diverse data sources, such as user interactions, content features, and contextual information, to address challenges like cold-start and data sparsity. However, existing methods often suffer from one or more key limitations: processing different modalities in isolation, requiring complete multimodal data for each interaction during training, or independent learning of user and item representations. These factors contribute to increased complexity and potential misalignment between user and item embeddings. To address these challenges, we propose DReX, a unified multimodal recommendation framework that incrementally refines user and item representations by leveraging interaction-level features from multimodal feedback. Our model employs gated recurrent units to selectively integrate these fine-grained features into global representations. This incremental update mechanism provides three key advantages: (1) simultaneous modeling of both nuanced interaction details and broader preference patterns, (2) eliminates the need for separate user and item feature extraction processes, leading to enhanced alignment in their learned representation, and (3) inherent robustness to varying or missing modalities. We evaluate the performance of the proposed approach on three real-world datasets containing reviews and ratings as interaction modalities. By considering review text as a modality, our approach automatically generates interpretable keyword profiles for both users and items, which supplement the recommendation process with interpretable preference indicators. Experiment results demonstrate that our approach outperforms state-of-the-art methods across all evaluated datasets.

DReX: An Explainable Deep Learning-based Multimodal Recommendation Framework

TL;DR

DReX is proposed, a unified multimodal recommendation framework that incrementally refines user and item representations by leveraging interaction-level features from multimodal feedback and automatically generates interpretable keyword profiles for both users and items, which supplement the recommendation process with interpretable preference indicators.

Abstract

Multimodal recommender systems leverage diverse data sources, such as user interactions, content features, and contextual information, to address challenges like cold-start and data sparsity. However, existing methods often suffer from one or more key limitations: processing different modalities in isolation, requiring complete multimodal data for each interaction during training, or independent learning of user and item representations. These factors contribute to increased complexity and potential misalignment between user and item embeddings. To address these challenges, we propose DReX, a unified multimodal recommendation framework that incrementally refines user and item representations by leveraging interaction-level features from multimodal feedback. Our model employs gated recurrent units to selectively integrate these fine-grained features into global representations. This incremental update mechanism provides three key advantages: (1) simultaneous modeling of both nuanced interaction details and broader preference patterns, (2) eliminates the need for separate user and item feature extraction processes, leading to enhanced alignment in their learned representation, and (3) inherent robustness to varying or missing modalities. We evaluate the performance of the proposed approach on three real-world datasets containing reviews and ratings as interaction modalities. By considering review text as a modality, our approach automatically generates interpretable keyword profiles for both users and items, which supplement the recommendation process with interpretable preference indicators. Experiment results demonstrate that our approach outperforms state-of-the-art methods across all evaluated datasets.
Paper Structure (18 sections, 9 equations, 8 figures, 4 tables)

This paper contains 18 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Pipeline of the proposed DReX framework. All vectors are column-oriented.
  • Figure 2: Outline of review-based feature extraction module.
  • Figure 3: Impact of learning rate ($\alpha$) and regularization ($\lambda$) on the DReX approach.
  • Figure 4: Performance of comparing algorithms in terms of F1-Score.
  • Figure 5: Results of the comparing algorithms (mean$\pm$standard deviation rank) in terms of $MAE$ and $NDCG@k$. ($\uparrow$" indicates that a higher value is better, while $\downarrow$" indicates that a lower value is better)
  • ...and 3 more figures