GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

Guojiao Lin; Zhen Meng; Dongjie Wang; Qingqing Long; Yuanchun Zhou; Meng Xiao

GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

Guojiao Lin, Zhen Meng, Dongjie Wang, Qingqing Long, Yuanchun Zhou, Meng Xiao

TL;DR

GUME tackles long-tail multimodal recommendations by enhancing tail-item connectivity through multimodal similarity-driven graph augmentation and by learning richer user modality representations via explicit interaction and extended interest embeddings. It introduces modality item graphs, semantic-neighbor augmentation, attribute separation into coarse and fine granularity, and dual alignment objectives to denoise signals from internal and external perspectives, all trained with a BPR-based objective. The approach yields strong performance gains across four Amazon domains, with notable improvements on tail items and evidence from ablations, visualizations, and hyperparameter analyses. Overall, GUME provides a scalable, generalizable framework that leverages multimodal item similarities and contrastive learning to improve long-tail multimodal recommendations in real-world datasets.

Abstract

Multimodal recommendation systems (MMRS) have received considerable attention from the research community due to their ability to jointly utilize information from user behavior and product images and text. Previous research has two main issues. First, many long-tail items in recommendation systems have limited interaction data, making it difficult to learn comprehensive and informative representations. However, past MMRS studies have overlooked this issue. Secondly, users' modality preferences are crucial to their behavior. However, previous research has primarily focused on learning item modality representations, while user modality representations have remained relatively simplistic.To address these challenges, we propose a novel Graphs and User Modalities Enhancement (GUME) for long-tail multimodal recommendation. Specifically, we first enhance the user-item graph using multimodal similarity between items. This improves the connectivity of long-tail items and helps them learn high-quality representations through graph propagation. Then, we construct two types of user modalities: explicit interaction features and extended interest features. By using the user modality enhancement strategy to maximize mutual information between these two features, we improve the generalization ability of user modality representations. Additionally, we design an alignment strategy for modality data to remove noise from both internal and external perspectives. Extensive experiments on four publicly available datasets demonstrate the effectiveness of our approach.

GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

TL;DR

Abstract

Paper Structure (36 sections, 31 equations, 6 figures, 3 tables)

This paper contains 36 sections, 31 equations, 6 figures, 3 tables.

Introduction
Problem Definition
Methodology
Enhancing User-Item Graph
Constructing Modality Item Graphs
Identifying Semantic Neighbors
Encoding Multiple Modalities
Extracting Explicit Interaction Features
Extracting Extended Interest Features
Attributes Separation for Better Integration
Separating Coarse-Grained Attributes
Separating Fine-Grained Attributes
Capturing Commonalities Through Alignment
Enhancing User Modality Representation
Model Prediction
...and 21 more sections

Figures (6)

Figure 1: The overview of $\textsc{GUME}$. We first utilize a graph convolutional network to extract explicit interaction features and extended interest features. Then, we separate and aggregate the attributes of the explicit interaction features to achieve denoising. We maximize the mutual information between explicit interaction features and extended interest features. Finally, we align information within internal modalities as well as between modalities and external behaviors.
Figure 2: Performance comparison of different item groups. We also compare the recommendation performance of long-tail items with MENTOR.
Figure 3: The distribution of explicit interaction features for users, $\bar{E}_{u,M}$. The left part of the figure shows the distribution without user modality enhancement, while the right part displays the distribution of $\textsc{GUME}$.
Figure 4: Effect of the balancing hyper-parameter $\alpha$
Figure 5: The Recall@20 results for different pairs of $\beta$ and $\tau$.
...and 1 more figures

GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

TL;DR

Abstract

GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)