Beyond IID: data-driven decision-making in heterogeneous environments

Omar Besbes; Will Ma; Omar Mouchtaki

Beyond IID: data-driven decision-making in heterogeneous environments

Omar Besbes, Will Ma, Omar Mouchtaki

TL;DR

This work develops a data-driven decision framework for heterogeneous environments where past data come from distributions within a radius $\epsilon$ of an unknown future distribution, and analyzes the resulting worst-case regret. It introduces a general reduction that upper-bounds asymptotic regret via a distributionally robust optimization (DRO) surrogate with a single past distribution, enabling tractable analysis of a broad policy class including SAA. An integral probability metric (IPM) based methodology links the heterogeneity notion to problem structure through an approximation parameter, yielding problem-specific regret bounds for canonical problems such as Newsvendor and Pricing, and revealing both the strengths and failures of SAA under different distances. To address SAA shortcomings, the paper designs rate-optimal policies, notably a Wasserstein-pricing policy that deflates SAA by $\delta=\sqrt{M\epsilon}$ to achieve tight $O(\sqrt{M\epsilon})$ regret guarantees, and discusses general DRO/RDRO-based approaches for robust, scalable solutions. The results highlight the crucial role of the heterogeneity type and problem structure in guiding the choice of data-driven policies and provide a principled path toward rate-optimal, robust decision-making in non-iid settings.

Abstract

How should one leverage historical data when past observations are not perfectly indicative of the future, e.g., due to the presence of unobserved confounders which one cannot "correct" for? Motivated by this question, we study a data-driven decision-making framework in which historical samples are generated from unknown and different distributions assumed to lie in a heterogeneity ball with known radius and centered around the (also) unknown future (out-of-sample) distribution on which the performance of a decision will be evaluated. This work aims at analyzing the performance of central data-driven policies but also near-optimal ones in these heterogeneous environments and understanding key drivers of performance. We establish a first result which allows to upper bound the asymptotic worst-case regret of a broad class of policies. Leveraging this result, for any integral probability metric, we provide a general analysis of the performance achieved by Sample Average Approximation (SAA) as a function of the radius of the heterogeneity ball. This analysis is centered around the approximation parameter, a notion of complexity we introduce to capture how the interplay between the heterogeneity and the problem structure impacts the performance of SAA. In turn, we illustrate through several widely-studied problems -- e.g., newsvendor, pricing -- how this methodology can be applied and find that the performance of SAA varies considerably depending on the combinations of problem classes and heterogeneity. The failure of SAA for certain instances motivates the design of alternative policies to achieve rate-optimality. We derive problem-dependent policies achieving strong guarantees for the illustrative problems described above and provide initial results towards a principled approach for the design and analysis of general rate-optimal algorithms.

Beyond IID: data-driven decision-making in heterogeneous environments

TL;DR

This work develops a data-driven decision framework for heterogeneous environments where past data come from distributions within a radius

of an unknown future distribution, and analyzes the resulting worst-case regret. It introduces a general reduction that upper-bounds asymptotic regret via a distributionally robust optimization (DRO) surrogate with a single past distribution, enabling tractable analysis of a broad policy class including SAA. An integral probability metric (IPM) based methodology links the heterogeneity notion to problem structure through an approximation parameter, yielding problem-specific regret bounds for canonical problems such as Newsvendor and Pricing, and revealing both the strengths and failures of SAA under different distances. To address SAA shortcomings, the paper designs rate-optimal policies, notably a Wasserstein-pricing policy that deflates SAA by

to achieve tight

regret guarantees, and discusses general DRO/RDRO-based approaches for robust, scalable solutions. The results highlight the crucial role of the heterogeneity type and problem structure in guiding the choice of data-driven policies and provide a principled path toward rate-optimal, robust decision-making in non-iid settings.

Abstract

Paper Structure (57 sections, 41 theorems, 276 equations, 2 tables)

This paper contains 57 sections, 41 theorems, 276 equations, 2 tables.

Introduction
Framework description
Contributions
Performance evaluation for general policies (\ref{['sec:reduction']})
Performance drivers for SAA and illustration for central problems (\ref{['sec:SAA-analysis']})
Design and analysis of policies beyond SAA (\ref{['sec:beyond_SAA']})
Impact of heterogeneity on performance across some classical problems
Further Related Work
Preliminaries
Problem formulation
Heterogeneous environments
Data-driven policies and performance
Reduction for policy evaluation
Analysis of SAA for integral probability metrics
Integral probability metrics : definition and examples
...and 42 more sections

Key Result

Proposition 1

There exists a data-driven decision problem in a heterogeneous environment $\mathcal{I} = \left( {\mathcal{X}} , \Xi, \mathcal{P}, d, g \right)$ and a radius of heterogeneity $\epsilon > 0$ such that the SAA policy defined in sec:SAA-analysis satisfies,

Theorems & Definitions (88)

Definition 1: Data-driven decision-making problem in heterogeneous environment
Remark 1: Relations between different notions of heterogeneity
Proposition 1
Definition 2: Empirical triangular convergence
Definition 3: Convexity
Theorem 1: Performance Evaluation through Upper Bound
Definition 4: Integral Probability Metrics
Definition 5: Approximation parameter
Theorem 2: Bounding Uniform Deviation under IPM heterogeneity
Definition 6: Maximal Generator
...and 78 more

Beyond IID: data-driven decision-making in heterogeneous environments

TL;DR

Abstract

Beyond IID: data-driven decision-making in heterogeneous environments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (88)