A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Yunchang Yang; Han Zhong; Tianhao Wu; Bin Liu; Liwei Wang; Simon S. Du

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Yunchang Yang, Han Zhong, Tianhao Wu, Bin Liu, Liwei Wang, Simon S. Du

TL;DR

A novel reduction-based framework is proposed, which turns any multi-batched algorithm for sequential decision making with instantaneous feedback into a sample-efficient algorithm that can handle stochastic delays in sequential decisionMaking.

Abstract

We study stochastic delayed feedback in general multi-agent sequential decision making, which includes bandits, single-agent Markov decision processes (MDPs), and Markov games (MGs). We propose a novel reduction-based framework, which turns any multi-batched algorithm for sequential decision making with instantaneous feedback into a sample-efficient algorithm that can handle stochastic delays in sequential decision making. By plugging different multi-batched algorithms into our framework, we provide several examples demonstrating that our framework not only matches or improves existing results for bandits, tabular MDPs, and tabular MGs, but also provides the first line of studies on delays in sequential decision making with function approximation. In summary, we provide a complete set of sharp results for multi-agent sequential decision making with delayed feedback.

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

TL;DR

Abstract

Paper Structure (28 sections, 24 theorems, 92 equations, 1 table, 6 algorithms)

This paper contains 28 sections, 24 theorems, 92 equations, 1 table, 6 algorithms.

Introduction
Our Contributions.
Related Works
Bandit/MDP with delayed feedback.
Low switching cost algorithm.
Markov Games.
Preliminary
Notations
Multi-agent Sequential Decision Making
MSDM with Delayed Feedback
Multi-batched Algorithm for MSDMs
Relation with low-switching cost algorithm
A Framework for Sequential Decision Making with Delayed Feedback
Results for Markov Games
Tabular Zero-Sum Markov Game
...and 13 more sections

Key Result

Theorem 1

Assume that in the undelayed environment, we have a multi-batched algorithm with $N_b$ batches, and the regret of the algorithm in $K$ episodes can be upper bounded by $\widetilde{\operatorname{Regret}}(K)$ with probability at least $1-\delta$. Then in the delayed feedback case, with probability at for any $q\in (0,1)$. In addition, if the delays satisfy Assumption ass:subexp, then with probabili

Theorems & Definitions (34)

Definition 1: Coarse Correlated Equilibrium
Theorem 1
Theorem 2
Corollary 1
Theorem 3
proof
Corollary 2
Theorem 4
proof
Corollary 3
...and 24 more

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

TL;DR

Abstract

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (34)