On The Statistical Complexity of Offline Decision-Making

Thanh Nguyen-Tang; Raman Arora

On The Statistical Complexity of Offline Decision-Making

Thanh Nguyen-Tang, Raman Arora

TL;DR

This work develops a unified minimax theory for offline decision-making with function approximation, identifying the pseudo-dimension of the value-function class and a new policy transfer coefficient framework as the core drivers of learnability. By introducing policy transfer coefficients that subsume prior data-coverage notions, the authors derive near-optimal lower and upper bounds for offline contextual bandits and MDPs, and extend the analysis to a hybrid offline-online setting with adaptive, Hedge-based procedures. The results show when and how offline data can accelerate online decision-making, and provide stable, distribution-shift-aware algorithms such as OfDM-Hedge and OfDM-Hedge-MDP that adapt to unknown transfer regimes. The findings illuminate the fundamental role of data quality and function-class complexity in offline RL with function approximation, while also outlining key gaps and directions for future work in nonparametric settings and fully adaptive hybrids.

Abstract

We study the statistical complexity of offline decision-making with function approximation, establishing (near) minimax-optimal rates for stochastic contextual bandits and Markov decision processes. The performance limits are captured by the pseudo-dimension of the (value) function class and a new characterization of the behavior policy that \emph{strictly} subsumes all the previous notions of data coverage in the offline decision-making literature. In addition, we seek to understand the benefits of using offline data in online decision-making and show nearly minimax-optimal rates in a wide range of regimes.

On The Statistical Complexity of Offline Decision-Making

TL;DR

Abstract

Paper Structure (39 sections, 24 theorems, 105 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 39 sections, 24 theorems, 105 equations, 2 figures, 1 table, 3 algorithms.

Introduction
Background and Problem Formulation
Stochastic contextual bandits
Offline data
Function approximation
Notation.
Offline Decision-Making as Transfer Learning
Relations with other notions of data coverage
Compared with concentrability coefficients.
Compared with the data diversity of nguyen-tang2023on.
Lower Bounds
Upper Bounds
Offline Data-assisted Online Decision-Making
Lower bounds
Upper bounds
...and 24 more sections

Key Result

Theorem 4.1

For any $C > 0, \rho \geq 1, n \geq d \cdot \max\{2^{2 \rho - 4} C, C^{\frac{1}{\rho-1}}/32 \}$, we have 0 1where the infimum is taken over all offline algorithm $\hat{\pi}(\cdot)$ (a randomized mapping from the offline data to a policy). 0where the infimum is taken over all offline algorithm $\hat{\pi}(\cdot)$ (a randomized mapping from the offline data to a policy).

Figures (2)

Figure 1: Hard MDPs
Figure :

Theorems & Definitions (57)

Definition 2.2: Pseudo-dimension
Definition 2.4: Covering number
Definition 3.1: Policy transfer coefficients
Remark 3.2
Example 3.3
Example 3.4
Theorem 4.1
Theorem 5.1
Remark 5.2: Adaptive to policy transfer coefficients
Remark 5.3: No cost blowup of the Hedge algorithm
...and 47 more

On The Statistical Complexity of Offline Decision-Making

TL;DR

Abstract

On The Statistical Complexity of Offline Decision-Making

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (57)