Transition Transfer $Q$-Learning for Composite Markov Decision Processes

Jinhang Chai; Elynn Chen; Lin Yang

Transition Transfer $Q$-Learning for Composite Markov Decision Processes

Jinhang Chai, Elynn Chen, Lin Yang

TL;DR

This work introduces a composite MDP framework where transition dynamics decompose into a low-rank shared component $L^*$ plus a sparse task-specific component $S^*$, enabling principled transfer in high-dimensional RL. It develops single-task UCB-$Q$-Learning for HD composite MDPs and a transfer-enabled UCB-TQL algorithm that leverages a source task to reduce target regret, achieving dimension-independent guarantees that scale with rank and sparsity. The transfer analysis shows that, with enough source data, the target regret can attain a rate of $\tilde{O}(\sqrt{eH^5N})$, where $e$ is the sparse difference, effectively decoupling from ambient dimension $d$. The work provides rigorous estimation error bounds for matrix recovery and introduces refined confidence regions that exploit sparsity differences, bridging theory and the practical benefits of transfer in complex transition dynamics.

Abstract

To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. This relaxes the common assumption of purely low-rank transition models, allowing for more realistic scenarios where tasks share core dynamics but maintain individual variations. We introduce UCB-TQL (Upper Confidence Bound Transfer Q-Learning), designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions. When applying UCB-TQL to a target task after training on a source task with sufficient trajectories, we achieve a regret bound of $\tilde{O}(\sqrt{eH^5N})$ that scales independently of the ambient dimension. Here, $N$ represents the number of trajectories in the target task, while $e$ quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.

Transition Transfer $Q$-Learning for Composite Markov Decision Processes

TL;DR

This work introduces a composite MDP framework where transition dynamics decompose into a low-rank shared component

plus a sparse task-specific component

, enabling principled transfer in high-dimensional RL. It develops single-task UCB-

-Learning for HD composite MDPs and a transfer-enabled UCB-TQL algorithm that leverages a source task to reduce target regret, achieving dimension-independent guarantees that scale with rank and sparsity. The transfer analysis shows that, with enough source data, the target regret can attain a rate of

, where

is the sparse difference, effectively decoupling from ambient dimension

. The work provides rigorous estimation error bounds for matrix recovery and introduces refined confidence regions that exploit sparsity differences, bridging theory and the practical benefits of transfer in complex transition dynamics.

Abstract

that scales independently of the ambient dimension. Here,

represents the number of trajectories in the target task, while

quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.

Transition Transfer $Q$-Learning for Composite Markov Decision Processes

TL;DR

Abstract

Transition Transfer $Q$-Learning for Composite Markov Decision Processes

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (21)