Table of Contents
Fetching ...

Robust Knowledge Transfer in Tiered Reinforcement Learning

Jiawei Huang, Niao He

TL;DR

This work addresses robust parallel transfer in Tiered Reinforcement Learning where a source low-tier task ${M}_{Lo}$ and a target high-tier task ${M}_{Hi}$ are learned concurrently with unknown task similarity. It introduces Optimal Value Dominance ($OVD$) and transferable states to characterize when transferring knowledge helps, and then develops robust algorithms for both single and multiple source-task settings. For single-source MAB and RL, the proposed methods balance pessimistic transfer from ${M}_{Lo}$ with online exploration in ${M}_{Hi}$, achieving constant regret on transferable regions and near-optimal performance elsewhere; when ${M}_{Hi}={M}_{Lo}$, the bound improves over prior results. The framework extends to multiple source tasks with a Trust-till-Failure mechanism, enabling ensemble benefits across larger state-action spaces with a modest log-factor cost in regret. Overall, the work provides theoretical guarantees for robust, parallel transfer in diverse, partially similar tasks with practical guidance for source-task selection and aggregation.

Abstract

In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the ``Optimal Value Dominance'' for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space.

Robust Knowledge Transfer in Tiered Reinforcement Learning

TL;DR

This work addresses robust parallel transfer in Tiered Reinforcement Learning where a source low-tier task and a target high-tier task are learned concurrently with unknown task similarity. It introduces Optimal Value Dominance () and transferable states to characterize when transferring knowledge helps, and then develops robust algorithms for both single and multiple source-task settings. For single-source MAB and RL, the proposed methods balance pessimistic transfer from with online exploration in , achieving constant regret on transferable regions and near-optimal performance elsewhere; when , the bound improves over prior results. The framework extends to multiple source tasks with a Trust-till-Failure mechanism, enabling ensemble benefits across larger state-action spaces with a modest log-factor cost in regret. Overall, the work provides theoretical guarantees for robust, parallel transfer in diverse, partially similar tasks with practical guidance for source-task selection and aggregation.

Abstract

In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the ``Optimal Value Dominance'' for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space.
Paper Structure (53 sections, 38 theorems, 154 equations, 1 figure, 7 algorithms)

This paper contains 53 sections, 38 theorems, 154 equations, 1 figure, 7 algorithms.

Key Result

Theorem 3.1

Under the violation of Assump. assump:opt_value_dominance, even regardless of the optimality of $\text{Alg}^\text{Lo}$, for each algorithm pair $(\text{Alg}^\text{Lo}, \text{Alg}^\text{Hi})$, it cannot simultaneously (1) achieve constant regret for the case when ${M_\text{Lo}} = {M_\text{Hi}}$ and (

Figures (1)

  • Figure 1: Regret in the Target Task given Multiple Source Tasks We report the result when $W$ source tasks are available with $W=0,1,2,5$. The shadows indicate 96% confidence interval.

Theorems & Definitions (70)

  • Definition 2.1: $\varepsilon$-Close
  • Definition 2.2: $\lambda$-Transferable States
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 5.1
  • Theorem 5.2
  • Lemma 5.2
  • ...and 60 more