Robust Knowledge Transfer in Tiered Reinforcement Learning

Jiawei Huang; Niao He

Robust Knowledge Transfer in Tiered Reinforcement Learning

Jiawei Huang, Niao He

TL;DR

This work addresses robust parallel transfer in Tiered Reinforcement Learning where a source low-tier task ${M}_{Lo}$ and a target high-tier task ${M}_{Hi}$ are learned concurrently with unknown task similarity. It introduces Optimal Value Dominance ($OVD$) and transferable states to characterize when transferring knowledge helps, and then develops robust algorithms for both single and multiple source-task settings. For single-source MAB and RL, the proposed methods balance pessimistic transfer from ${M}_{Lo}$ with online exploration in ${M}_{Hi}$, achieving constant regret on transferable regions and near-optimal performance elsewhere; when ${M}_{Hi}={M}_{Lo}$, the bound improves over prior results. The framework extends to multiple source tasks with a Trust-till-Failure mechanism, enabling ensemble benefits across larger state-action spaces with a modest log-factor cost in regret. Overall, the work provides theoretical guarantees for robust, parallel transfer in diverse, partially similar tasks with practical guidance for source-task selection and aggregation.

Abstract

In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the ``Optimal Value Dominance'' for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space.

Robust Knowledge Transfer in Tiered Reinforcement Learning

TL;DR

This work addresses robust parallel transfer in Tiered Reinforcement Learning where a source low-tier task

and a target high-tier task

are learned concurrently with unknown task similarity. It introduces Optimal Value Dominance (

) and transferable states to characterize when transferring knowledge helps, and then develops robust algorithms for both single and multiple source-task settings. For single-source MAB and RL, the proposed methods balance pessimistic transfer from

with online exploration in

, achieving constant regret on transferable regions and near-optimal performance elsewhere; when

, the bound improves over prior results. The framework extends to multiple source tasks with a Trust-till-Failure mechanism, enabling ensemble benefits across larger state-action spaces with a modest log-factor cost in regret. Overall, the work provides theoretical guarantees for robust, parallel transfer in diverse, partially similar tasks with practical guidance for source-task selection and aggregation.

Abstract

Paper Structure (53 sections, 38 theorems, 154 equations, 1 figure, 7 algorithms)

This paper contains 53 sections, 38 theorems, 154 equations, 1 figure, 7 algorithms.

Introduction
Closely Related Work
Preliminary and Problem Formulation
Assumptions and Characterization of Transferable States
Lower Bound Results: Necessary Condition for Robust Transfer
Robust Tiered MAB/RL with Single Source Task
Robust Transfer in Tiered Multi-Armed Bandits
Robust Transfer in Tiered Tabular RL
Robust Tiered MAB/RL with Multiple Low-Tier Tasks
Experiments
Conclusion and Future Work
Extended Introduction
Tiered-RL Framework
Other Related Works
Online and Offline RL
...and 38 more sections

Key Result

Theorem 3.1

Under the violation of Assump. assump:opt_value_dominance, even regardless of the optimality of $\text{Alg}^\text{Lo}$, for each algorithm pair $(\text{Alg}^\text{Lo}, \text{Alg}^\text{Hi})$, it cannot simultaneously (1) achieve constant regret for the case when ${M_\text{Lo}} = {M_\text{Hi}}$ and (

Figures (1)

Figure 1: Regret in the Target Task given Multiple Source Tasks We report the result when $W$ source tasks are available with $W=0,1,2,5$. The shadows indicate 96% confidence interval.

Theorems & Definitions (70)

Definition 2.1: $\varepsilon$-Close
Definition 2.2: $\lambda$-Transferable States
Theorem 3.1
Theorem 3.2
Theorem 4.1
Theorem 4.2
Theorem 4.3
Theorem 5.1
Theorem 5.2
Lemma 5.2
...and 60 more

Robust Knowledge Transfer in Tiered Reinforcement Learning

TL;DR

Abstract

Robust Knowledge Transfer in Tiered Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (70)