SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Shuai Zhang; Heshan Devaka Fernando; Miao Liu; Keerthiram Murugesan; Songtao Lu; Pin-Yu Chen; Tianyi Chen; Meng Wang

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang

TL;DR

The first convergence analysis with provable generalization guarantees for SF-DQN with GPI is established, revealing that SF-DQN with GPI outperforms conventional RL approaches in terms of both faster convergence rate and better generalization.

Abstract

This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (44 sections, 16 theorems, 184 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 44 sections, 16 theorems, 184 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Problem Formulation and Algorithm
Successor feature Deep Q-Network
Theoretical Results
Summary of Major Theoretical Findings
Assumptions
Main Theoretical Findings
Convergence analysis of SF-DQN
Improved performance with GPI.
Improved Performance with the Knowledge Transfer
Technical Challenges, and Comparison with Existing Works
Experiments
Conclusions
...and 29 more sections

Key Result

Theorem 1

Suppose the assumptions in Section sec:ass hold and the initial neuron weights of the SF of task $1$ satisfy for some positive $c_N$. When we select the step size as $\eta_t = \frac{1}{t+1}$, and the size of the replay buffer is Then, with the high probability of at least $1-q^{-d}$, the weights $\theta^{()}[T]$ from Algorithm Alg satisfy where $C_1 = (2+\gamma)\cdot R_{\max}$, and $C^\star =

Figures (5)

Figure 1: Performance of SF-DQN presented in Algorithm \ref{['Alg']} on Task 1.
Figure 2: Transfer comparison for SF-DQN and DQN (with GPI)
Figure 3: Additional experiments on synthetic environment
Figure 4: Single source to single target task transfer experiments on Reacher environment
Figure 5: Multple source/target tasks transfer experiments on Reacher environment

Theorems & Definitions (33)

Theorem 1: Convergence analysis of SF-DQN without GPI
Theorem 2: Convergence analysis of SF-DQN with GPI
Theorem 3: Transfer learning via SF-DQN
Theorem 4: Transfer learning via DQN
Lemma 1: Weyl's inequality, B97
Lemma 2: T12, Theorem 1.6
Lemma 3: Lemma 5.2, V2010
Lemma 4: Lemma 5.3, V2010
Lemma 5: Mean Value Theorem
Definition 1: Definition 5.7, V2010
...and 23 more

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

TL;DR

Abstract

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (33)