A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning
Adrien Banse, Venkatraman Renganathan, Raphaël M. Jungers
TL;DR
The paper introduces a Cantor-Kantorovich metric between Markov Decision Processes by defining horizon-based trajectory distributions and leveraging the Cantor distance to compare MDP dynamics. It proves an iterative computation framework and a convergence bound for finite horizons, enabling scalable approximation of the asymptotic distance. The authors demonstrate transfer-learning forecasting in a grid-world setting, showing that smaller distances between source and target correlate with larger jump-start rewards and TL gains. This work provides a practical tool for source selection and performance forecasting in reinforcement learning transfer learning, with potential extensions to more complex TL scenarios.
Abstract
We extend the notion of Cantor-Kantorovich distance between Markov chains introduced by (Banse et al., 2023) in the context of Markov Decision Processes (MDPs). The proposed metric is well-defined and can be efficiently approximated given a finite horizon. Then, we provide numerical evidences that the latter metric can lead to interesting applications in the field of reinforcement learning. In particular, we show that it could be used for forecasting the performance of transfer learning algorithms.
