Table of Contents
Fetching ...

BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation

Qile Fan, Penghang Yu, Zhiyi Tan, Bing-Kun Bao, Guanming Lu

TL;DR

The paper identifies that pre-trained content encoders can produce item features with information drift and omission, weakening user preference modeling in multimodal recommendations. It introduces a similarity-based attribution analysis to diagnose content-feature quality and BeFA, a plug-in, behavior-guided feature adapter that decouples, filters, and reconstructs content features using behavioral signals. BeFA is shown to consistently improve performance across multiple datasets, encoders, and baseline models with modest parameter overhead, indicating strong generalizability. Visualizations corroborate that BeFA focuses content features on relevant item details and reduces noise, suggesting practical impact for real-world multimodal recommendation systems.

Abstract

Multimedia recommender systems focus on utilizing behavioral information and content information to model user preferences. Typically, it employs pre-trained feature encoders to extract content features, then fuses them with behavioral features. However, pre-trained feature encoders often extract features from the entire content simultaneously, including excessive preference-irrelevant details. We speculate that it may result in the extracted features not containing sufficient features to accurately reflect user preferences. To verify our hypothesis, we introduce an attribution analysis method for visually and intuitively analyzing the content features. The results indicate that certain products' content features exhibit the issues of information drift}and information omission,reducing the expressive ability of features. Building upon this finding, we propose an effective and efficient general Behavior-driven Feature Adapter (BeFA) to tackle these issues. This adapter reconstructs the content feature with the guidance of behavioral information, enabling content features accurately reflecting user preferences. Extensive experiments demonstrate the effectiveness of the adapter across all multimedia recommendation methods. Our code is made publicly available on https://github.com/fqldom/BeFA.

BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation

TL;DR

The paper identifies that pre-trained content encoders can produce item features with information drift and omission, weakening user preference modeling in multimodal recommendations. It introduces a similarity-based attribution analysis to diagnose content-feature quality and BeFA, a plug-in, behavior-guided feature adapter that decouples, filters, and reconstructs content features using behavioral signals. BeFA is shown to consistently improve performance across multiple datasets, encoders, and baseline models with modest parameter overhead, indicating strong generalizability. Visualizations corroborate that BeFA focuses content features on relevant item details and reduces noise, suggesting practical impact for real-world multimodal recommendation systems.

Abstract

Multimedia recommender systems focus on utilizing behavioral information and content information to model user preferences. Typically, it employs pre-trained feature encoders to extract content features, then fuses them with behavioral features. However, pre-trained feature encoders often extract features from the entire content simultaneously, including excessive preference-irrelevant details. We speculate that it may result in the extracted features not containing sufficient features to accurately reflect user preferences. To verify our hypothesis, we introduce an attribution analysis method for visually and intuitively analyzing the content features. The results indicate that certain products' content features exhibit the issues of information drift}and information omission,reducing the expressive ability of features. Building upon this finding, we propose an effective and efficient general Behavior-driven Feature Adapter (BeFA) to tackle these issues. This adapter reconstructs the content feature with the guidance of behavioral information, enabling content features accurately reflecting user preferences. Extensive experiments demonstrate the effectiveness of the adapter across all multimedia recommendation methods. Our code is made publicly available on https://github.com/fqldom/BeFA.
Paper Structure (28 sections, 12 equations, 9 figures, 4 tables)

This paper contains 28 sections, 12 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Illustration of content features that do not accurately reflect the users' preferences. Excessive irrelevant information hinders the recommender system's ability to effectively model the users' true preferences.
  • Figure 2: Results of visualisation attribution analysis on the TMALL dataset. The first row contains the original item images while the second row displays the corresponding heatmaps. The four samples on the left reflect information drift and the four samples on the right reflect information omission.
  • Figure 3: Pipeline of the proposed attribution analysis method.
  • Figure 4: Comparison of parameter tuning methods. (a) Low-Rank Adaptation (b) Soft Prompt Turning (c) BeFA.
  • Figure 5: Visualization analysis of the effect of adapter on feature purification. The shade of the colour represents the amount of attention weight.
  • ...and 4 more figures