Table of Contents
Fetching ...

Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective

Adam N. Elmachtoub, Henry Lam, Haofeng Zhang, Yunfan Zhao

TL;DR

The paper analyzes three data-driven optimization paradigms—Sample Average Approximation (SAA), Estimate-Then-Optimize (ETO), and Integrated-Estimation-Optimization (IEO)—in a stochastic setting with parametric distribution modeling. It establishes that, under a well-specified model class and large sample sizes, the regret distribution satisfies a first-order stochastic dominance: G^{ETO} \preceq_{st} G^{IEO} \preceq_{st} G^{SAA}, with ETO asymptotically most favorable due to the Cramer–Rao bound, while SAA is worst. When the model is misspecified, the ordering reverses: the regrets of ETO and IEO no longer vanish, and the dominance relations invert, implying SAA or alternative baselines may outperform under misspecification. The results extend to constrained and contextual stochastic optimization, employing KKT conditions, projection arguments, and delta-method-based normal approximations to derive analogous dominance results. Across synthetic examples (newsvendor, portfolio optimization) and real-data experiments, the findings hold in finite-sample regimes and reveal when simpler ETO can be preferable to the integrated approach, especially in well-specified, data-rich contexts. These insights inform practical methodology by clarifying when the added computational burden of IEO yields meaningful gains versus when estimation-then-optimization suffices.

Abstract

In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (IEO), can be readily shown to outperform simple estimate-then-optimize (ETO) when the model is misspecified. In this paper, we show that a reverse behavior appears when the model class is well-specified and there is sufficient data. Specifically, for a general class of nonlinear stochastic optimization problems, we show that simple ETO outperforms IEO asymptotically when the model class covers the ground truth, in the strong sense of stochastic dominance of the regret. Namely, the entire distribution of the regret, not only its mean or other moments, is always better for ETO compared to IEO. Our results also apply to constrained, contextual optimization problems where the decision depends on observed features. Whenever applicable, we also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret, and best when it is misspecified. Finally, we provide experimental results to support our theoretical comparisons and illustrate when our insights hold in finite-sample regimes and under various degrees of misspecification.

Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective

TL;DR

The paper analyzes three data-driven optimization paradigms—Sample Average Approximation (SAA), Estimate-Then-Optimize (ETO), and Integrated-Estimation-Optimization (IEO)—in a stochastic setting with parametric distribution modeling. It establishes that, under a well-specified model class and large sample sizes, the regret distribution satisfies a first-order stochastic dominance: G^{ETO} \preceq_{st} G^{IEO} \preceq_{st} G^{SAA}, with ETO asymptotically most favorable due to the Cramer–Rao bound, while SAA is worst. When the model is misspecified, the ordering reverses: the regrets of ETO and IEO no longer vanish, and the dominance relations invert, implying SAA or alternative baselines may outperform under misspecification. The results extend to constrained and contextual stochastic optimization, employing KKT conditions, projection arguments, and delta-method-based normal approximations to derive analogous dominance results. Across synthetic examples (newsvendor, portfolio optimization) and real-data experiments, the findings hold in finite-sample regimes and reveal when simpler ETO can be preferable to the integrated approach, especially in well-specified, data-rich contexts. These insights inform practical methodology by clarifying when the added computational burden of IEO yields meaningful gains versus when estimation-then-optimization suffices.

Abstract

In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (IEO), can be readily shown to outperform simple estimate-then-optimize (ETO) when the model is misspecified. In this paper, we show that a reverse behavior appears when the model class is well-specified and there is sufficient data. Specifically, for a general class of nonlinear stochastic optimization problems, we show that simple ETO outperforms IEO asymptotically when the model class covers the ground truth, in the strong sense of stochastic dominance of the regret. Namely, the entire distribution of the regret, not only its mean or other moments, is always better for ETO compared to IEO. Our results also apply to constrained, contextual optimization problems where the decision depends on observed features. Whenever applicable, we also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret, and best when it is misspecified. Finally, we provide experimental results to support our theoretical comparisons and illustrate when our insights hold in finite-sample regimes and under various degrees of misspecification.
Paper Structure (57 sections, 29 theorems, 323 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 57 sections, 29 theorems, 323 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Proposition 1.A

Suppose Assumption EOconsistency:assm holds. Then $\hat{{\bm{w}}}^{SAA}\xrightarrow{P}{\bm{w}}^*$.

Figures (8)

  • Figure 1: A multi-product newsvendor problem in the well-specified setting. The tail probability and moments are calculated over 500 random seeds. For this set of experiments, the number of products is $p=2$.
  • Figure 2: The regret plots show median, 25$^{th}$ quantile, and 75$^{th}$ quantile over 50 random seeds. For the unconstrained case and the constrained case, the number of products is $p=5$. For the contextual case, the number of products is $p=1$, similar to the setting in ban2019big.
  • Figure 3: Results from well-specified to misspecified model. The regret plots show median, 25$^{th}$ quantile, and 75$^{th}$ quantile over 50 random seeds.
  • Figure 4: Results for varying the relative dimensions of the decision and the parameter. The regret plots show median, 25$^{th}$ quantile, and 75$^{th}$ quantile over 50 random seeds. Results are for the unconstrained case, where the parameter dimension is fixed. Sample size is $n=100$.
  • Figure 5: Real world data experiments with all available features. The regret plots show median, 25$^{th}$ quantile, and 75$^{th}$ quantile over 50 random seeds.
  • ...and 3 more figures

Theorems & Definitions (36)

  • Definition 1: Well-Specified Model Family
  • Definition 2: Misspecified Model Family
  • Definition 3: Regret
  • Definition 4: Stochastic Dominance
  • Proposition 1.A: Consistency of SAA
  • Proposition 1.B: Consistency of ETO
  • Proposition 1.C: Consistency of IEO
  • Proposition 2.A: Asymptotic normality for SAA
  • Proposition 2.B: Asymptotic normality for ETO
  • Proposition 2.C: Asymptotic normality for IEO
  • ...and 26 more