Table of Contents
Fetching ...

Don't Lose Yourself: Boosting Multimodal Recommendation via Reducing Node-neighbor Discrepancy in Graph Convolutional Network

Zheyu Chen, Jinfeng Xu, Haibo Hu

TL;DR

This work tackles over-smoothing in graph convolutional networks used for multimodal recommendations by introducing RedN$^{\text{n}}$D, which reduces node-neighbor discrepancy to preserve ego-node personalization. The method builds modality-specific heterogeneous graphs, fuses them with a trainable item-item graph, and applies a contrastive, InfoNCE-based alignment between ego nodes and their neighbors, while balancing neighbor-layer representations. The model optimizes a joint objective $\mathcal{L} = \mathcal{L}_{rec} + \lambda_c \mathcal{L}_{cl}$, with $\mathcal{L}_{rec}$ from a LightGCN backbone and $\mathcal{L}_{cl}$ enforcing ego-neighbor alignment through a temperature-scaled contrastive loss. Experimental results on three Amazon datasets show substantial gains in Recall@K and NDCG@K, particularly on denser and sparser domains, validating improved accuracy and robustness and demonstrating effective mitigation of over-smoothing in multimodal graph learning.

Abstract

The rapid expansion of multimedia contents has led to the emergence of multimodal recommendation systems. It has attracted increasing attention in recommendation systems because its full utilization of data from different modalities alleviates the persistent data sparsity problem. As such, multimodal recommendation models can learn personalized information about nodes in terms of visual and textual. To further alleviate the data sparsity problem, some previous works have introduced graph convolutional networks (GCNs) for multimodal recommendation systems, to enhance the semantic representation of users and items by capturing the potential relationships between them. However, adopting GCNs inevitably introduces the over-smoothing problem, which make nodes to be too similar. Unfortunately, incorporating multimodal information will exacerbate this challenge because nodes that are too similar will lose the personalized information learned through multimodal information. To address this problem, we propose a novel model that retains the personalized information of ego nodes during feature aggregation by Reducing Node-neighbor Discrepancy (RedN^nD). Extensive experiments on three public datasets show that RedN^nD achieves state-of-the-art performance on accuracy and robustness, with significant improvements over existing GCN-based multimodal frameworks.

Don't Lose Yourself: Boosting Multimodal Recommendation via Reducing Node-neighbor Discrepancy in Graph Convolutional Network

TL;DR

This work tackles over-smoothing in graph convolutional networks used for multimodal recommendations by introducing RedND, which reduces node-neighbor discrepancy to preserve ego-node personalization. The method builds modality-specific heterogeneous graphs, fuses them with a trainable item-item graph, and applies a contrastive, InfoNCE-based alignment between ego nodes and their neighbors, while balancing neighbor-layer representations. The model optimizes a joint objective , with from a LightGCN backbone and enforcing ego-neighbor alignment through a temperature-scaled contrastive loss. Experimental results on three Amazon datasets show substantial gains in Recall@K and NDCG@K, particularly on denser and sparser domains, validating improved accuracy and robustness and demonstrating effective mitigation of over-smoothing in multimodal graph learning.

Abstract

The rapid expansion of multimedia contents has led to the emergence of multimodal recommendation systems. It has attracted increasing attention in recommendation systems because its full utilization of data from different modalities alleviates the persistent data sparsity problem. As such, multimodal recommendation models can learn personalized information about nodes in terms of visual and textual. To further alleviate the data sparsity problem, some previous works have introduced graph convolutional networks (GCNs) for multimodal recommendation systems, to enhance the semantic representation of users and items by capturing the potential relationships between them. However, adopting GCNs inevitably introduces the over-smoothing problem, which make nodes to be too similar. Unfortunately, incorporating multimodal information will exacerbate this challenge because nodes that are too similar will lose the personalized information learned through multimodal information. To address this problem, we propose a novel model that retains the personalized information of ego nodes during feature aggregation by Reducing Node-neighbor Discrepancy (RedN^nD). Extensive experiments on three public datasets show that RedN^nD achieves state-of-the-art performance on accuracy and robustness, with significant improvements over existing GCN-based multimodal frameworks.

Paper Structure

This paper contains 15 sections, 13 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overall architecture of our RedN$^{\text{n}}$D.
  • Figure 2: Effect of hyper-parameters: $k$, $\lambda$ and $\lambda_c$.
  • Figure 3: Visualization via t-SNE.