Table of Contents
Fetching ...

Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization

Chenbei Lu, Laixi Shi, Zaiwei Chen, Chenye Wu, Adam Wierman

TL;DR

This work proposes overcoming the curse of dimensionality by approximately factorizing the original Markov decision processes (MDPs) into smaller, independently evolving MDPs, which enables the development of sample-efficient RL algorithms in both model-based and model-free settings, with the latter involving a variant of variance-reduced Q-learning.

Abstract

Reinforcement Learning (RL) algorithms are known to suffer from the curse of dimensionality, which refers to the fact that large-scale problems often lead to exponentially high sample complexity. A common solution is to use deep neural networks for function approximation; however, such approaches typically lack theoretical guarantees. To provably address the curse of dimensionality, we observe that many real-world problems exhibit task-specific model structures that, when properly leveraged, can improve the sample efficiency of RL. Building on this insight, we propose overcoming the curse of dimensionality by approximately factorizing the original Markov decision processes (MDPs) into smaller, independently evolving MDPs. This factorization enables the development of sample-efficient RL algorithms in both model-based and model-free settings, with the latter involving a variant of variance-reduced Q-learning. We provide improved sample complexity guarantees for both proposed algorithms. Notably, by leveraging model structure through the approximate factorization of the MDP, the dependence of sample complexity on the size of the state-action space can be exponentially reduced. Numerically, we demonstrate the practicality of our proposed methods through experiments on both synthetic MDP tasks and a wind farm-equipped storage control problem.

Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization

TL;DR

This work proposes overcoming the curse of dimensionality by approximately factorizing the original Markov decision processes (MDPs) into smaller, independently evolving MDPs, which enables the development of sample-efficient RL algorithms in both model-based and model-free settings, with the latter involving a variant of variance-reduced Q-learning.

Abstract

Reinforcement Learning (RL) algorithms are known to suffer from the curse of dimensionality, which refers to the fact that large-scale problems often lead to exponentially high sample complexity. A common solution is to use deep neural networks for function approximation; however, such approaches typically lack theoretical guarantees. To provably address the curse of dimensionality, we observe that many real-world problems exhibit task-specific model structures that, when properly leveraged, can improve the sample efficiency of RL. Building on this insight, we propose overcoming the curse of dimensionality by approximately factorizing the original Markov decision processes (MDPs) into smaller, independently evolving MDPs. This factorization enables the development of sample-efficient RL algorithms in both model-based and model-free settings, with the latter involving a variant of variance-reduced Q-learning. We provide improved sample complexity guarantees for both proposed algorithms. Notably, by leveraging model structure through the approximate factorization of the MDP, the dependence of sample complexity on the size of the state-action space can be exponentially reduced. Numerically, we demonstrate the practicality of our proposed methods through experiments on both synthetic MDP tasks and a wind farm-equipped storage control problem.

Paper Structure

This paper contains 67 sections, 23 theorems, 235 equations, 6 figures, 4 algorithms.

Key Result

Theorem 5.1

Given any approximate factorization scheme $\omega$, let $\mathcal{E}_\omega = \gamma(1-\gamma)^{-2} \Delta^P_\omega + (1-\gamma)^{-1} \Delta^R_\omega$. For any confidence level $\delta > 0$ and the desired accuracy level $\epsilon \in (0,1)$, with probability at least $1 - \delta$, the output Q-fun provided that the total number of samples, denoted by $D_\omega$, satisfies: where $\kappa_p \in[0

Figures (6)

  • Figure 1: Bipartite Graph Representation and Approximate Factorization.
  • Figure 2: Synchronous Sampling with Exclusive Scopes.
  • Figure 3: Performance on Perfectly Factorizable MDPs
  • Figure 4: Performance on Imperfectly Factorizable MDPs
  • Figure 5: Wind Farm-equipped Storage Control.
  • ...and 1 more figures

Theorems & Definitions (47)

  • Definition 3.1
  • Theorem 5.1
  • Theorem 6.1
  • Definition A.1
  • Definition A.2
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • ...and 37 more