Multi-Agent Transfer Learning via Temporal Contrastive Learning
Weihao Zeng, Joseph Campbell, Simon Stepputtis, Katia Sycara
TL;DR
This work tackles sample-inefficient transfer learning in multi-agent reinforcement learning by combining goal-conditioned policies with unsupervised temporal abstraction. The method pre-trains a GCRL agent on a source environment, finetunes it on a target domain, and learns a temporal latent space via contrastive learning to build a planning graph whose nodes are latent clusters and edges are observed transitions. Sub-goals derived from the graph guide execution in the target domain, enabling efficient planning and improved interpretability. In Overcooked experiments, the approach achieves similar or better performance with only about 21.7% of the training data required by baselines, demonstrating strong gains in sample efficiency and the ability to handle sparse rewards and long-horizon tasks.
Abstract
This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that guides the agent via sub-goals. Experiments on multi-agent coordination Overcooked tasks demonstrate improved sample efficiency, the ability to solve sparse-reward and long-horizon problems, and enhanced interpretability compared to baselines. The results highlight the effectiveness of integrating goal-conditioned policies with unsupervised temporal abstraction learning for complex multi-agent transfer learning. Compared to state-of-the-art baselines, our method achieves the same or better performances while requiring only 21.7% of the training samples.
