Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

Han-Dong Lim; HyeAnn Lee; Donghwan Lee

Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

Han-Dong Lim, HyeAnn Lee, Donghwan Lee

TL;DR

This paper dives into the sample complexity of Q-learning when integrated with a model-based approach, and seeks to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.

Abstract

Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of $Q$-learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.

Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

TL;DR

This paper dives into the sample complexity of Q-learning when integrated with a model-based approach, and seeks to elucidate the conditions under which model-based

-learning excels in terms of sample efficiency compared to its model-free counterpart.

Abstract

Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these,

-learning has proven to be a powerful algorithm in model-free settings. However, the extension of

-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of

-learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based

-learning excels in terms of sample efficiency compared to its model-free counterpart.

Paper Structure (26 sections, 12 theorems, 54 equations, 2 figures, 1 algorithm)

This paper contains 26 sections, 12 theorems, 54 equations, 2 figures, 1 algorithm.

Introduction
Preliminaries
Markov decision process
Overview of Q-learning
Switched system theory
Synchronous Model-based Q-learning
Model-based approach
Switched system perspective on SyncMBQ
Concentration inequality for e
Sample complexity of Algorihtm \ref{['algo:sync_q']}
Experiments
Error bound
Performance on Benchmark Environments
Conclusion
Notations
...and 11 more sections

Key Result

Lemma 3.2

For $m=\frac{1}{d_{\min} }\ln \frac{2|{\mathcal{S}}||{\mathcal{A}}|}{\delta}$, with probability at least $1-\frac{\delta}{2}$, every state-action pairs are visited, i.e.,

Figures (2)

Figure 1: Showing decreasing error between ${\bm{Q}}_k$ and ${\bm{Q}}^*$ in a random MDP. Seven runs for the same MDP are conducted. Moving averages are highlighted as vivid line.
Figure 2: Perfomance of the synchronous model-based $Q$-learning. Taxi (top row) and FrozenLake (bottom row). For graphs, moving averages are highlighted as vivid line with a window size of 20 episodes for Taxi and 100 for FrozenLake. For tables, the mean and standard deviation averaged over 20 runs are shown.

Theorems & Definitions (24)

Remark 3.1
Lemma 3.2
Lemma 3.3
Lemma 3.4
Lemma 3.5
Lemma 3.6
Theorem 3.7
Lemma B.1: Theorem B.6 in shalev2014understanding
Lemma B.2
proof
...and 14 more

Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

TL;DR

Abstract

Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (24)