Table of Contents
Fetching ...

Multimodal-enhanced Federated Recommendation: A Group-wise Fusion Approach

Chunxu Zhang, Weipeng Zhang, Guodong Long, Zhiheng Xue, Riting Xia, Bo Yang

TL;DR

This work proposes a novel multimodal fusion mechanism in federated recommendation settings (GFMFR), which offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead.

Abstract

Federated Recommendation (FR) is a new learning paradigm to tackle the learn-to-rank problem in a privacy-preservation manner. How to integrate multi-modality features into federated recommendation is still an open challenge in terms of efficiency, distribution heterogeneity, and fine-grained alignment. To address these challenges, we propose a novel multimodal fusion mechanism in federated recommendation settings (GFMFR). Specifically, it offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead. Moreover, a group-aware item representation fusion approach enables fine-grained knowledge sharing among similar users while retaining individual preferences. The proposed fusion loss could be simply plugged into any existing federated recommender systems empowering their capability by adding multi-modality features. Extensive experiments on five public benchmark datasets demonstrate that GFMFR consistently outperforms state-of-the-art multimodal FR baselines.

Multimodal-enhanced Federated Recommendation: A Group-wise Fusion Approach

TL;DR

This work proposes a novel multimodal fusion mechanism in federated recommendation settings (GFMFR), which offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead.

Abstract

Federated Recommendation (FR) is a new learning paradigm to tackle the learn-to-rank problem in a privacy-preservation manner. How to integrate multi-modality features into federated recommendation is still an open challenge in terms of efficiency, distribution heterogeneity, and fine-grained alignment. To address these challenges, we propose a novel multimodal fusion mechanism in federated recommendation settings (GFMFR). Specifically, it offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead. Moreover, a group-aware item representation fusion approach enables fine-grained knowledge sharing among similar users while retaining individual preferences. The proposed fusion loss could be simply plugged into any existing federated recommender systems empowering their capability by adding multi-modality features. Extensive experiments on five public benchmark datasets demonstrate that GFMFR consistently outperforms state-of-the-art multimodal FR baselines.

Paper Structure

This paper contains 46 sections, 7 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of the proposed GFMFR. The entire process unfolds in five stages. Initially, before the federated optimization begins, the server generates initial multimodal representations based on raw multimodal data (Step ①). During each communication round, clients first train local recommendation models using their private data and upload the resulting item embeddings and prediction functions to the server (Step ②). The server then performs global aggregation on the received embeddings and functions, groups users based on their prediction functions, and computes group-level item embeddings and prediction functions (Step ③). Leveraging these group-level embeddings, the server trains a multimodal aggregation model and projects the fused representations into the preference space via the group-level prediction functions (Step ④). Finally, the global parameters and group-level preference representations are sent back to the clients to refine their local models (Step ⑤).
  • Figure 2: Ablation Study of Group-aware Multimodal Aggregation Mechanism.
  • Figure 3: Ablation Study of Preference-guided Distillation Strategy.
  • Figure 4: Visualization of client grouping dynamics during training on dataset Tools_and_Home. Each color indicates a distinct group.
  • Figure 5: Hyper-parameter analysis results on dataset Beauty.
  • ...and 4 more figures