Attribute-driven Disentangled Representation Learning for Multimodal Recommendation

Zhenyang Li; Fan Liu; Yinwei Wei; Zhiyong Cheng; Liqiang Nie; Mohan Kankanhalli

Attribute-driven Disentangled Representation Learning for Multimodal Recommendation

Zhenyang Li, Fan Liu, Yinwei Wei, Zhiyong Cheng, Liqiang Nie, Mohan Kankanhalli

TL;DR

This work tackles interpretability and controllability gaps in multimodal recommender systems by proposing Attribute-Driven Disentangled Representation Learning (AD-DRL). AD-DRL assigns explicit attributes to disentangled factors and learns them at both a high-level (within- and cross-modality consistency via intra-modality losses and cross-modal contrastive losses) and a low-level (attribute-value relationships via multimodal attention and value prediction) granularity. Preference prediction is performed per-attribute with $s_{u,i,k} = \sigma(\mathbf{v}_u^k \cdot \mathbf{v}_y^k)$ and aggregated as $s_{u,i} = \sum_{k=1}^K s_{u,i,k}$, optimized by $\mathcal{L} = \mathcal{L}_{BPR} + \alpha \mathcal{L}_{intra} + \beta \mathcal{L}_{inter} + \gamma \mathcal{L}_{low}$. Experiments on three real-world Amazon datasets show AD-DRL achieving state-of-the-art Recall@20 and NDCG@20, with strong interpretability and controllability demonstrated through visualizations and controllability analyses.

Abstract

Recommendation algorithms forecast user preferences by correlating user and item representations derived from historical interaction patterns. In pursuit of enhanced performance, many methods focus on learning robust and independent representations by disentangling the intricate factors within interaction data across various modalities in an unsupervised manner. However, such an approach obfuscates the discernment of how specific factors (e.g., category or brand) influence the outcomes, making it challenging to regulate their effects. In response to this challenge, we introduce a novel method called Attribute-Driven Disentangled Representation Learning (short for AD-DRL), which explicitly incorporates attributes from different modalities into the disentangled representation learning process. By assigning a specific attribute to each factor in multimodal features, AD-DRL can disentangle the factors at both attribute and attribute-value levels. To obtain robust and independent representations for each factor associated with a specific attribute, we first disentangle the representations of features both within and across different modalities. Moreover, we further enhance the robustness of the representations by fusing the multimodal features of the same factor. Empirical evaluations conducted on three public real-world datasets substantiate the effectiveness of AD-DRL, as well as its interpretability and controllability.

Attribute-driven Disentangled Representation Learning for Multimodal Recommendation

TL;DR

and aggregated as

, optimized by

. Experiments on three real-world Amazon datasets show AD-DRL achieving state-of-the-art Recall@20 and NDCG@20, with strong interpretability and controllability demonstrated through visualizations and controllability analyses.

Abstract

Paper Structure (28 sections, 14 equations, 5 figures, 3 tables)

This paper contains 28 sections, 14 equations, 5 figures, 3 tables.

Introduction
Related Work
Multimodal Collaborative Filtering
Disentangled Representation Learning
Method
Preliminaries
Problem Setting
Notations.
Intuition of Attribute-driven Disentanglement
Attribute-driven Disentangled Representation Learning
High-Level Attribute-driven Disentangled Representation Learning
Low-level Attribute-driven Disentangled Representation Learning
Preference Prediction and Model Learning
Preference Prediction
Training Protocol
...and 13 more sections

Figures (5)

Figure 1: High-level and low-level attribute-driven disentangled representation learning module of our proposed AD-DRL. To disentangle attribute factors in multimodal features, (a) Intra-modality disentanglement module exploits the difference between attribute factors (e.g., price, brand, category and popularity) within the same modality feature, (b) Inter-modality disentanglement module utilizes the consistency of the same attribute factor in different modality features, and (c) low-level disentangled representation learning module leverages the intrinsic relationships between items sharing the same attribute value (e.g., the popularity value of different levels: Super Popular, Popular, Moderate, Emerging, and Unknown).
Figure 2: Visualization of disentangled vectors from ID embeddings and different modalities, with distinct colors indicating different attributes: yellow for price, orange for popularity, grey for brand, dark blue for category, and light blue for others.
Figure 3: Visualization of the disentangled vectors corresponding to different attributes, where different colors represent different attribute values.
Figure 4: The preference scores of two users ($u$18629 and $u$6805) for different attributes of two items ($i$2099 and $i$3460).
Figure 5: The proportion of items with different attribute values within the AD-DRL's recommendations when $\xi$ takes different values in Equation \ref{['equation: controllability']}.

Attribute-driven Disentangled Representation Learning for Multimodal Recommendation

TL;DR

Abstract

Attribute-driven Disentangled Representation Learning for Multimodal Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)