Table of Contents
Fetching ...

Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

Han-Dong Lim, HyeAnn Lee, Donghwan Lee

TL;DR

This paper dives into the sample complexity of Q-learning when integrated with a model-based approach, and seeks to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.

Abstract

Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of $Q$-learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.

Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

TL;DR

This paper dives into the sample complexity of Q-learning when integrated with a model-based approach, and seeks to elucidate the conditions under which model-based -learning excels in terms of sample efficiency compared to its model-free counterpart.

Abstract

Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, -learning has proven to be a powerful algorithm in model-free settings. However, the extension of -learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of -learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based -learning excels in terms of sample efficiency compared to its model-free counterpart.
Paper Structure (26 sections, 12 theorems, 54 equations, 2 figures, 1 algorithm)

This paper contains 26 sections, 12 theorems, 54 equations, 2 figures, 1 algorithm.

Key Result

Lemma 3.2

For $m=\frac{1}{d_{\min} }\ln \frac{2|{\mathcal{S}}||{\mathcal{A}}|}{\delta}$, with probability at least $1-\frac{\delta}{2}$, every state-action pairs are visited, i.e.,

Figures (2)

  • Figure 1: Showing decreasing error between ${\bm{Q}}_k$ and ${\bm{Q}}^*$ in a random MDP. Seven runs for the same MDP are conducted. Moving averages are highlighted as vivid line.
  • Figure 2: Perfomance of the synchronous model-based $Q$-learning. Taxi (top row) and FrozenLake (bottom row). For graphs, moving averages are highlighted as vivid line with a window size of 20 episodes for Taxi and 100 for FrozenLake. For tables, the mean and standard deviation averaged over 20 runs are shown.

Theorems & Definitions (24)

  • Remark 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Lemma 3.5
  • Lemma 3.6
  • Theorem 3.7
  • Lemma B.1: Theorem B.6 in shalev2014understanding
  • Lemma B.2
  • proof
  • ...and 14 more