Reinforcement Learning with Lookahead Information

Nadav Merlis

Reinforcement Learning with Lookahead Information

Nadav Merlis

TL;DR

This work designs provably-efficient learning algorithms able to incorporate lookahead information and proves that their algorithms achieve tight regret versus a baseline that also has access to lookahead information - linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.

Abstract

We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including transactions, navigation and more. When the environment is known, previous work shows that this lookahead information can drastically increase the collected reward. However, outside of specific applications, existing approaches for interacting with unknown environments are not well-adapted to these observations. In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information - linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.

Reinforcement Learning with Lookahead Information

TL;DR

Abstract

Paper Structure (43 sections, 33 theorems, 179 equations, 2 figures, 4 algorithms)

This paper contains 43 sections, 33 theorems, 179 equations, 2 figures, 4 algorithms.

Introduction
Related Work.
Setting and Notations
Reward Lookahead.
Transition Lookahead.
Other Notations.
Comparing the Values of Lookahead Agents and Vanilla RL agents
Reward lookahead.
Transition lookahead.
Planning and Learning with One-Step Reward Lookahead
Regret-Minimization with Reward Lookahead
Proof Concepts
Reinforcement Learning with One-Step Transition Lookahead
Regret-Minimization with Transition Lookahead
Proof Concepts
...and 28 more sections

Key Result

Proposition 0

The optimal value of one-step reward lookahead agents satisfies Also, given reward observations $\boldsymbol{R}=\{*\}{R(a)}_{a\in\mathcal{A}}$ at state $s$ and step $h$, the optimal policy is

Figures (2)

Figure 1: Two-state prophet-like problem
Figure 2: Random chain: agents start at the left side and must reach its right side to collect a reward.

Theorems & Definitions (60)

Proposition 0
Theorem 0
Proposition 0
Theorem 0
Proposition 0
proof
Remark 1
Lemma 1
proof
Lemma 2: Value-Difference Lemma with Reward Lookahead
...and 50 more

Reinforcement Learning with Lookahead Information

TL;DR

Abstract

Reinforcement Learning with Lookahead Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (60)