Centralized Reward Agent for Knowledge Sharing and Transfer in Multi-Task Reinforcement Learning
Haozhe Ma, Zhengding Luo, Thanh Vinh Vo, Kuankuan Sima, Tze-Yun Leong
TL;DR
This paper tackles sparse-reward challenges in multi-task reinforcement learning by introducing CenRA, a framework that couples a Centralized Reward Agent (CRA) with multiple policy agents. The CRA distills cross-task knowledge into dense, task-informed knowledge rewards and distributes them back to policy agents to accelerate learning, while an information synchronization mechanism balances knowledge sharing based on task similarity and real-time learning progress. Empirical results across discrete and continuous domains, notably Meta-World and additional benchmarks, show CenRA achieves faster convergence, robust transfer to unseen tasks, and more stable, balanced performance than strong baselines. The work highlights the practical impact of centralized reward shaping for efficient, transferable multi-task RL, while also outlining limitations and avenues for adaptive weighting and heterogeneous-task extensions.
Abstract
Reward shaping is effective in addressing the sparse-reward challenge in reinforcement learning (RL) by providing immediate feedback through auxiliary, informative rewards. Based on the reward shaping strategy, we propose a novel multi-task reinforcement learning framework that integrates a centralized reward agent (CRA) and multiple distributed policy agents. The CRA functions as a knowledge pool, aimed at distilling knowledge from various tasks and distributing it to individual policy agents to improve learning efficiency. Specifically, the shaped rewards serve as a straightforward metric for encoding knowledge. This framework not only enhances knowledge sharing across established tasks but also adapts to new tasks by transferring meaningful reward signals. We validate the proposed method on both discrete and continuous domains, including the representative Meta-World benchmark, demonstrating its robustness in multi-task sparse-reward settings and its effective transferability to unseen tasks.
