A Fast Convergence Theory for Offline Decision Making

Chenjie Mao; Qiaosheng Zhang

A Fast Convergence Theory for Offline Decision Making

Chenjie Mao, Qiaosheng Zhang

TL;DR

The paper tackles learnability in offline decision making with general function approximation by introducing a unified framework, DMOF, and the practical algorithm EDD. EDD achieves an instance-dependent upper bound governed by EOEC and, under Markovian sequential problems with partial data coverage, attains a fast convergence rate of $1/N$ alongside a complementary lower bound based on OEC. The work formalizes a hardness measure via OEC and connects it to EOEC bounds, providing a minimax perspective and insights into the gap between upper and lower bounds. Overall, the results offer a principled theory for fast-converging offline decision making that covers offline RL and OPE, with potential extensions to tabular and POMDP settings and online analogs.

Abstract

This paper proposes the first generic fast convergence result in general function approximation for offline decision making problems, which include offline reinforcement learning (RL) and off-policy evaluation (OPE) as special cases. To unify different settings, we introduce a framework called Decision Making with Offline Feedback (DMOF), which captures a wide range of offline decision making problems. Within this framework, we propose a simple yet powerful algorithm called Empirical Decision with Divergence (EDD), whose upper bound can be termed as a coefficient named Empirical Offline Estimation Coefficient (EOEC). We show that EOEC is instance-dependent and actually measures the correlation of the problem. When assuming partial coverage in the dataset, EOEC will reduce in a rate of $1/N$ where $N$ is the size of the dataset, endowing EDD with a fast convergence guarantee. Finally, we complement the above results with a lower bound in the DMOF framework, which further demonstrates the soundness of our theory.

A Fast Convergence Theory for Offline Decision Making

TL;DR

alongside a complementary lower bound based on OEC. The work formalizes a hardness measure via OEC and connects it to EOEC bounds, providing a minimax perspective and insights into the gap between upper and lower bounds. Overall, the results offer a principled theory for fast-converging offline decision making that covers offline RL and OPE, with potential extensions to tabular and POMDP settings and online analogs.

Abstract

where

is the size of the dataset, endowing EDD with a fast convergence guarantee. Finally, we complement the above results with a lower bound in the DMOF framework, which further demonstrates the soundness of our theory.

Paper Structure (38 sections, 17 theorems, 92 equations, 1 table)

This paper contains 38 sections, 17 theorems, 92 equations, 1 table.

Introduction
Organization of This Paper
Related Works
General Function Approximation in Offline Decision Making
Instance-Dependent Bounds in Offline Decision Making
Faster Convergence Rate in Sequential Problems
Works Related to DEC
Preliminaries on $f$-Divergence
Framework: Decision Making with Offline Feedback (DMOF)
$\mathcal{M}$ as an Information Representor
On the Variation of the Loss Function $\mathcal{L}$
On the Data Distribution
Algorithm: Empirical Decision with Divergence (EDD)
Instance-dependent Guarantee of EDD
Fast Convergence Rates for EDD under I.I.D. and Correlation
...and 23 more sections

Key Result

Theorem 1

For any real $M^\star\in\mathcal{M}$ and $\mathcal{D}^\star\sim M^\star$, EDD (eqn:ed2) with a proper choice of its hyper-parameter has its loss bounded as

Theorems & Definitions (45)

Definition 1: $f$-divergence
Example 1: DMOF in Offline RL
Definition 2
Theorem 1
Definition 3: Markovian Sequential Problems
Remark 1: A Comparison Between Markovian Sequential Problems and MDPs
Remark 2
Remark 3
Remark 4
Theorem 2
...and 35 more

A Fast Convergence Theory for Offline Decision Making

TL;DR

Abstract

A Fast Convergence Theory for Offline Decision Making

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (45)