Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

Yudan Wang; Shaofeng Zou; Yue Wang

Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

Yudan Wang, Shaofeng Zou, Yue Wang

TL;DR

This paper develops algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provides finite sample analyses under all three cases, and represents the first model-free DR-RL approach featuring finite sample complexity for total variation and Chi-square divergence uncertainty sets.

Abstract

Distributionally Robust Reinforcement Learning (DR-RL) aims to derive a policy optimizing the worst-case performance within a predefined uncertainty set. Despite extensive research, previous DR-RL algorithms have predominantly favored model-based approaches, with limited availability of model-free methods offering convergence guarantees or sample complexities. This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo (MLMC) technique to close such a gap. Our innovative approach integrates a threshold mechanism that ensures finite sample requirements for algorithmic implementation, a significant improvement than previous model-free algorithms. We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provide finite sample analyses under all three cases. Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity for total variation and Chi-square divergence uncertainty sets, while also offering an improved sample complexity and broader applicability compared to existing model-free DR-RL algorithms for the KL divergence model. The complexities of our method establish the tightest results for all three uncertainty models in model-free DR-RL, underscoring the effectiveness and efficiency of our algorithm, and highlighting its potential for practical applications.

Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

TL;DR

Abstract

Paper Structure (22 sections, 22 theorems, 76 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 22 theorems, 76 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Major Contributions
Related Works
Preliminaries and Problem Formulations
Markov Decision Processes
Distributionally Robust MDPs
Strong Duality
Model-free Threshold-MLMC Algorithm
Sample Complexity
Total Variation distance
Chi-square Divergence
KL Divergence
Proof Sketch
Conclusion
Numerical Result
...and 7 more sections

Key Result

Lemma 2.1

iyengar2005robust The optimization problem: is equivalent to where $\text{Span}(X)=\max_i X(i)-\min_i X(i)$. If moreover set then, the optimization problem is also equivalent to

Figures (5)

Figure 1: Garnet $\mathcal{G}(20,15)$ (a)TV (b) $\chi^2$ (c) KL uncertainty set
Figure 2: Recycling Robot (a)TV (b) $\chi^2$ uncertainty set
Figure 3: FrozenLake (a)TV (b) $\chi^2$ (c) KL uncertainty set
Figure 4: Gambler (a)TV (b) $\chi^2$ (c) KL uncertainty set
Figure 5: T-MLMC v.s. MLMC (a)TV (b) $\chi^2$ uncertainty set

Theorems & Definitions (38)

Lemma 2.1: Total variation distance
Lemma 2.2: Chi-square
Lemma 2.3: KL divergence
Remark 2.4
Theorem 4.1
Theorem 4.2: Sample Complexity with TV Distance
Theorem 4.3: Sample Complexity with $\chi^2$ Distance
Theorem 4.4: Sample Complexity with KL Distance
Definition B.1: Biased estimation
Proposition B.2: Threshold MLMC
...and 28 more

Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

TL;DR

Abstract

Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (38)