A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning

Adrien Banse; Venkatraman Renganathan; Raphaël M. Jungers

A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning

Adrien Banse, Venkatraman Renganathan, Raphaël M. Jungers

TL;DR

The paper introduces a Cantor-Kantorovich metric between Markov Decision Processes by defining horizon-based trajectory distributions and leveraging the Cantor distance to compare MDP dynamics. It proves an iterative computation framework and a convergence bound for finite horizons, enabling scalable approximation of the asymptotic distance. The authors demonstrate transfer-learning forecasting in a grid-world setting, showing that smaller distances between source and target correlate with larger jump-start rewards and TL gains. This work provides a practical tool for source selection and performance forecasting in reinforcement learning transfer learning, with potential extensions to more complex TL scenarios.

Abstract

We extend the notion of Cantor-Kantorovich distance between Markov chains introduced by (Banse et al., 2023) in the context of Markov Decision Processes (MDPs). The proposed metric is well-defined and can be efficiently approximated given a finite horizon. Then, we provide numerical evidences that the latter metric can lead to interesting applications in the field of reinforcement learning. In particular, we show that it could be used for forecasting the performance of transfer learning algorithms.

A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning

TL;DR

Abstract

Paper Structure (6 sections, 1 theorem, 8 equations, 2 figures)

This paper contains 6 sections, 1 theorem, 8 equations, 2 figures.

Introduction
The Cantor-Kantorovich metric in the context of MDPs
Preliminaries About Markov Decision Processes
A metric between dynamics of two MDPs
Application to Transfer Learning
Conclusion & Future Outlook

Key Result

Theorem 1

Given a horizon $N > 1$ and two policies $p$ and $q$, it holds that with $r_{p, q}(\mathbf{s}^N) = \min\left\{ \mathbb{P}^N_p(\mathbf{s}^N), \mathbb{Q}^N_q(\mathbf{s}^N)\right\}$.

Figures (2)

Figure 1: A grid-world of size $10 \times 10$ with a goal in $(4, 4)$. If a time $k$, the action $u_k = \text{right}$ is chosen, then the probability of going in this direction is $\delta$, and the probability to go in other directions is $(1-\delta)/3$.
Figure 2: Results of the transfer learning experiment. The x-axis is the Cantor-Kantorovich distance between the target and the sources. The y-axis is the jumpstart metric, i.e. the metric used to asses the performance of the transfer. Green and red dots correspond to sources with $\delta < 1/2$, and $\delta \geq 1/2$ respectively.

Theorems & Definitions (2)

Definition 1: MDP
Theorem 1

A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning

TL;DR

Abstract

A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)