Table of Contents
Fetching ...

Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

Yudan Wang, Shaofeng Zou, Yue Wang

TL;DR

This paper develops algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provides finite sample analyses under all three cases, and represents the first model-free DR-RL approach featuring finite sample complexity for total variation and Chi-square divergence uncertainty sets.

Abstract

Distributionally Robust Reinforcement Learning (DR-RL) aims to derive a policy optimizing the worst-case performance within a predefined uncertainty set. Despite extensive research, previous DR-RL algorithms have predominantly favored model-based approaches, with limited availability of model-free methods offering convergence guarantees or sample complexities. This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo (MLMC) technique to close such a gap. Our innovative approach integrates a threshold mechanism that ensures finite sample requirements for algorithmic implementation, a significant improvement than previous model-free algorithms. We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provide finite sample analyses under all three cases. Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity for total variation and Chi-square divergence uncertainty sets, while also offering an improved sample complexity and broader applicability compared to existing model-free DR-RL algorithms for the KL divergence model. The complexities of our method establish the tightest results for all three uncertainty models in model-free DR-RL, underscoring the effectiveness and efficiency of our algorithm, and highlighting its potential for practical applications.

Model-Free Robust Reinforcement Learning with Sample Complexity Analysis

TL;DR

This paper develops algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provides finite sample analyses under all three cases, and represents the first model-free DR-RL approach featuring finite sample complexity for total variation and Chi-square divergence uncertainty sets.

Abstract

Distributionally Robust Reinforcement Learning (DR-RL) aims to derive a policy optimizing the worst-case performance within a predefined uncertainty set. Despite extensive research, previous DR-RL algorithms have predominantly favored model-based approaches, with limited availability of model-free methods offering convergence guarantees or sample complexities. This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo (MLMC) technique to close such a gap. Our innovative approach integrates a threshold mechanism that ensures finite sample requirements for algorithmic implementation, a significant improvement than previous model-free algorithms. We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provide finite sample analyses under all three cases. Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity for total variation and Chi-square divergence uncertainty sets, while also offering an improved sample complexity and broader applicability compared to existing model-free DR-RL algorithms for the KL divergence model. The complexities of our method establish the tightest results for all three uncertainty models in model-free DR-RL, underscoring the effectiveness and efficiency of our algorithm, and highlighting its potential for practical applications.
Paper Structure (22 sections, 22 theorems, 76 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 22 theorems, 76 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Lemma 2.1

iyengar2005robust The optimization problem: is equivalent to where $\text{Span}(X)=\max_i X(i)-\min_i X(i)$. If moreover set then, the optimization problem is also equivalent to

Figures (5)

  • Figure 1: Garnet $\mathcal{G}(20,15)$ (a)TV (b) $\chi^2$ (c) KL uncertainty set
  • Figure 2: Recycling Robot (a)TV (b) $\chi^2$ uncertainty set
  • Figure 3: FrozenLake (a)TV (b) $\chi^2$ (c) KL uncertainty set
  • Figure 4: Gambler (a)TV (b) $\chi^2$ (c) KL uncertainty set
  • Figure 5: T-MLMC v.s. MLMC (a)TV (b) $\chi^2$ uncertainty set

Theorems & Definitions (38)

  • Lemma 2.1: Total variation distance
  • Lemma 2.2: Chi-square
  • Lemma 2.3: KL divergence
  • Remark 2.4
  • Theorem 4.1
  • Theorem 4.2: Sample Complexity with TV Distance
  • Theorem 4.3: Sample Complexity with $\chi^2$ Distance
  • Theorem 4.4: Sample Complexity with KL Distance
  • Definition B.1: Biased estimation
  • Proposition B.2: Threshold MLMC
  • ...and 28 more