Table of Contents
Fetching ...

Breaking the Curse of Knowledge: Towards Effective Multimodal Recommendation using Knowledge Soft Integration

Kai Ouyang, Chen Tang, Zenghao Chai, Wenhao Zheng, Xiangjin Xie, Xuanji Xiao, Zhi Wang

TL;DR

This work tackles the curse of knowledge in multimodal recommendation by decoupling knowledge extraction from task guidance through Knowledge Soft Integration (KSI). KSI combines a Structure Efficient Injection module with a Refined Graph Neural Network to model robust item structure and a Semantic Soft Integration module to softly align multimodal semantics, improving representation expressiveness. The framework, compatible with downstream collaborative filtering models, yields consistent gains over state-of-the-art baselines across multiple real-world datasets, with improvements in the range of 2.07% to 5.46% reported in the study and larger gains on more redundancy-rich settings. Overall, KSI offers a practical and effective approach to integrate multimodal knowledge while mitigating biases from independent feature extraction, advancing the robustness of multimodal recommender systems.

Abstract

A critical challenge in contemporary recommendation systems lies in effectively leveraging multimodal content to enhance recommendation personalization. Although various solutions have been proposed, most fail to account for discrepancies between knowledge extracted through isolated feature extraction and its application in recommendation tasks. Specifically, multimodal feature extraction does not incorporate task-specific prior knowledge, while downstream recommendation tasks typically use these features as auxiliary information. This misalignment often introduces biases in model fitting and degrades performance, a phenomenon we refer to as the curse of knowledge. To address this challenge, we propose a knowledge soft integration framework designed to balance the utilization of multimodal features with the biases they may introduce. The framework, named Knowledge Soft Integration (KSI), comprises two key components: the Structure Efficient Injection (SEI) module and the Semantic Soft Integration (SSI) module. The SEI module employs a Refined Graph Neural Network (RGNN) to model inter-modal correlations among items while introducing a regularization term to minimize redundancy in user and item representations. In parallel, the SSI module utilizes a self-supervised retrieval task to implicitly integrate multimodal semantic knowledge, thereby enhancing the semantic distinctiveness of item representations. We conduct comprehensive experiments on three benchmark datasets, demonstrating KSI's effectiveness. Furthermore, these results underscore the ability of the SEI and SSI modules to reduce representation redundancy and mitigate the curse of knowledge in multimodal recommendation systems.

Breaking the Curse of Knowledge: Towards Effective Multimodal Recommendation using Knowledge Soft Integration

TL;DR

This work tackles the curse of knowledge in multimodal recommendation by decoupling knowledge extraction from task guidance through Knowledge Soft Integration (KSI). KSI combines a Structure Efficient Injection module with a Refined Graph Neural Network to model robust item structure and a Semantic Soft Integration module to softly align multimodal semantics, improving representation expressiveness. The framework, compatible with downstream collaborative filtering models, yields consistent gains over state-of-the-art baselines across multiple real-world datasets, with improvements in the range of 2.07% to 5.46% reported in the study and larger gains on more redundancy-rich settings. Overall, KSI offers a practical and effective approach to integrate multimodal knowledge while mitigating biases from independent feature extraction, advancing the robustness of multimodal recommender systems.

Abstract

A critical challenge in contemporary recommendation systems lies in effectively leveraging multimodal content to enhance recommendation personalization. Although various solutions have been proposed, most fail to account for discrepancies between knowledge extracted through isolated feature extraction and its application in recommendation tasks. Specifically, multimodal feature extraction does not incorporate task-specific prior knowledge, while downstream recommendation tasks typically use these features as auxiliary information. This misalignment often introduces biases in model fitting and degrades performance, a phenomenon we refer to as the curse of knowledge. To address this challenge, we propose a knowledge soft integration framework designed to balance the utilization of multimodal features with the biases they may introduce. The framework, named Knowledge Soft Integration (KSI), comprises two key components: the Structure Efficient Injection (SEI) module and the Semantic Soft Integration (SSI) module. The SEI module employs a Refined Graph Neural Network (RGNN) to model inter-modal correlations among items while introducing a regularization term to minimize redundancy in user and item representations. In parallel, the SSI module utilizes a self-supervised retrieval task to implicitly integrate multimodal semantic knowledge, thereby enhancing the semantic distinctiveness of item representations. We conduct comprehensive experiments on three benchmark datasets, demonstrating KSI's effectiveness. Furthermore, these results underscore the ability of the SEI and SSI modules to reduce representation redundancy and mitigate the curse of knowledge in multimodal recommendation systems.
Paper Structure (29 sections, 22 equations, 3 figures, 3 tables)

This paper contains 29 sections, 22 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of our proposed framework, Knowledge Soft Integration (KSI). Bold paths are used to denote the backbone network.
  • Figure 2: Performance comparison of our variants study in terms of Recall@20 on the Sports dataset.
  • Figure 3: Study on $\alpha$ and $\beta$.