Table of Contents
Fetching ...

MARCO: A Cooperative Knowledge Transfer Framework for Personalized Cross-domain Recommendations

Lili Xie, Yi Zhang, Ruihong Qiu, Jiajun Liu, Sen Wang

TL;DR

MARCO tackles data sparsity in cold-start cross-domain recommendations by deploying a cooperative MARL framework where each agent estimates the contribution of a distinct source domain. It couples multi-source personalized bridges with MAPPO, and introduces an entropy-based action-diversity penalty to stabilize training and counter distributional discrepancies across domains. Empirical results on four Amazon sub-categories show MARCO achieving superior accuracy and robustness against negative transfer, with strong generalization to varying cold-start rates and source-domain configurations. The approach offers practical benefits for scalable, cross-domain personalization by effectively leveraging heterogeneous source-domain signals through coordinated, diverse agent policies.

Abstract

Recommender systems frequently encounter data sparsity issues, particularly when addressing cold-start scenarios involving new users or items. Multi-source cross-domain recommendation (CDR) addresses these challenges by transferring valuable knowledge from multiple source domains to enhance recommendations in a target domain. However, existing reinforcement learning (RL)-based CDR methods typically rely on a single-agent framework, leading to negative transfer issues caused by inconsistent domain contributions and inherent distributional discrepancies among source domains. To overcome these limitations, MARCO, a Multi-Agent Reinforcement Learning-based Cross-Domain recommendation framework, is proposed. It leverages cooperative multi-agent reinforcement learning, where each agent is dedicated to estimating the contribution from an individual source domain, effectively managing credit assignment and mitigating negative transfer. In addition, an entropy-based action diversity penalty is introduced to enhance policy expressiveness and stabilize training by encouraging diverse agents' joint actions. Extensive experiments across four benchmark datasets demonstrate MARCO's superior performance over state-of-the-art methods, highlighting its robustness and strong generalization capabilities. The code is at https://github.com/xiewilliams/MARCO.

MARCO: A Cooperative Knowledge Transfer Framework for Personalized Cross-domain Recommendations

TL;DR

MARCO tackles data sparsity in cold-start cross-domain recommendations by deploying a cooperative MARL framework where each agent estimates the contribution of a distinct source domain. It couples multi-source personalized bridges with MAPPO, and introduces an entropy-based action-diversity penalty to stabilize training and counter distributional discrepancies across domains. Empirical results on four Amazon sub-categories show MARCO achieving superior accuracy and robustness against negative transfer, with strong generalization to varying cold-start rates and source-domain configurations. The approach offers practical benefits for scalable, cross-domain personalization by effectively leveraging heterogeneous source-domain signals through coordinated, diverse agent policies.

Abstract

Recommender systems frequently encounter data sparsity issues, particularly when addressing cold-start scenarios involving new users or items. Multi-source cross-domain recommendation (CDR) addresses these challenges by transferring valuable knowledge from multiple source domains to enhance recommendations in a target domain. However, existing reinforcement learning (RL)-based CDR methods typically rely on a single-agent framework, leading to negative transfer issues caused by inconsistent domain contributions and inherent distributional discrepancies among source domains. To overcome these limitations, MARCO, a Multi-Agent Reinforcement Learning-based Cross-Domain recommendation framework, is proposed. It leverages cooperative multi-agent reinforcement learning, where each agent is dedicated to estimating the contribution from an individual source domain, effectively managing credit assignment and mitigating negative transfer. In addition, an entropy-based action diversity penalty is introduced to enhance policy expressiveness and stabilize training by encouraging diverse agents' joint actions. Extensive experiments across four benchmark datasets demonstrate MARCO's superior performance over state-of-the-art methods, highlighting its robustness and strong generalization capabilities. The code is at https://github.com/xiewilliams/MARCO.

Paper Structure

This paper contains 30 sections, 14 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison between single-RL and MARL policies in cross-domain recommendation. Source-domain features are first transformed to personalized information. RL policies then determine each domain’s contribution to the highly relevant feature space. The radius of each circle denotes the relative contribution of each domain. In the single-RL setting, the contributions of domains 1, 2, and $n$ are inaccurately estimated due to the absence of global knowledge and cooperation optimization, whereas Multi-Agent RL yields more balanced and adaptive contributions across domains.
  • Figure 2: The overview of MARCO. The domain-specific user embeddings $\mathbf{u}^d_i$ and the sequence of item embeddings $\mathcal{S}^d_{u_i}$ are trained independently through Matrix Factorization (MF). Personalized bridge modules utilize MLP networks with encoded sequence embeddings $\mathbf{q}^{d}_{u_i}$ as input to generate personalized transformed domain-specific embeddings $\mathbf{e}^{d}_{u_i}$ for user ${u}_{i}$. To leverage the most informative and transferable knowledge from source domains, the Multi-agent Proximal Policy Optimization (MAPPO) framework with a cross-domain entropy term is adopted to determine the weight of domain-specific embeddings $\mathbf{e}^{d}_{u_i}$ and obtain the initial embedding $\mathbf{u}^t_i$ for the cold start user in the target domain to boost the recommendation performance.
  • Figure 3: Visualization of the effectiveness of entropy regularization for the Book domain as the target by three variants: MARCO (orange), MAPPO (green), and MAPPO w/Ent (blue). The entropy regularization in MAPPO leads to a lower prediction error for orange dots in the bottom left corner.
  • Figure 4: Transfer learning across cold start scenarios.
  • Figure 5: The robustness to the number of source domains
  • ...and 1 more figures