On the Benefit of Optimal Transport for Curriculum Reinforcement Learning
Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen
TL;DR
This paper reframes curriculum reinforcement learning as constrained optimal transport between task distributions to ensure gradual, geometry-aware progression of task difficulty. By replacing KL-based similarity and/ or pure performance constraints with a Wasserstein-OT formulation, the authors introduce currot, a curriculum method that concentrates probability mass on contexts meeting a performance threshold, and compare it to gradient, which relies on Wasserstein barycenters between initial and target distributions. Through theoretical discussion and extensive experiments across discrete and continuous context spaces, the work demonstrates that OT-based curricula yield faster and more reliable learning, especially in settings with infeasible target tasks or non-Gaussian task distributions. The results highlight the importance of explicit task similarity measures and adaptive constraint handling, and point to future directions in learned distance metrics and hybrid adaptive curricula.
Abstract
Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in various works, it is less clear how to generate them for a given learning environment, resulting in various methods aiming to automate this task. In this work, we focus on framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in various tasks with different characteristics.
