Table of Contents
Fetching ...

Asymptotically Optimal Regret for Black-Box Predict-then-Optimize

Samuel Tan, Peter I. Frazier

TL;DR

This work addresses black-box predict-then-optimize settings where the objective is unknown and rewards are observed only for the chosen action. It introduces Empirical Soft Regret (ESR), a differentiable surrogate that targets downstream regret rather than traditional accuracy metrics, enabling gradient-based training for flexible models. Theoretical analysis shows ESR achieves asymptotically optimal regret under standard conditions, and empirical results on IHDP and Yahoo News demonstrate superior decision quality (lower regret and higher CTR) compared to MSE-based and bandit baselines. The approach offers a practical framework for decision-focused learning with black-box objectives, with potential extensions to richer action spaces, unpaired data, reinforcement learning, and preference-based settings.

Abstract

We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes that rewards are observed for all actions for all historical contexts, which is possible only in problems with special structure. Motivated by problems from ads targeting and recommender systems, we study new black-box predict-then-optimize problems that lack this special structure and where we only observe the reward from the action taken. We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training compared to classical accuracy-based metrics like mean-squared error. This loss function targets the regret achieved when taking a suboptimal decision; because the regret is generally not differentiable, we propose a differentiable "soft" regret term that allows the use of neural networks and other flexible machine learning models dependent on gradient-based training. In the particular case of paired data, we show theoretically that optimizing our loss function yields asymptotically optimal regret within the class of supervised learning models. We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare compared to benchmark methods from contextual bandits and conditional average treatment effect estimation.

Asymptotically Optimal Regret for Black-Box Predict-then-Optimize

TL;DR

This work addresses black-box predict-then-optimize settings where the objective is unknown and rewards are observed only for the chosen action. It introduces Empirical Soft Regret (ESR), a differentiable surrogate that targets downstream regret rather than traditional accuracy metrics, enabling gradient-based training for flexible models. Theoretical analysis shows ESR achieves asymptotically optimal regret under standard conditions, and empirical results on IHDP and Yahoo News demonstrate superior decision quality (lower regret and higher CTR) compared to MSE-based and bandit baselines. The approach offers a practical framework for decision-focused learning with black-box objectives, with potential extensions to richer action spaces, unpaired data, reinforcement learning, and preference-based settings.

Abstract

We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes that rewards are observed for all actions for all historical contexts, which is possible only in problems with special structure. Motivated by problems from ads targeting and recommender systems, we study new black-box predict-then-optimize problems that lack this special structure and where we only observe the reward from the action taken. We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training compared to classical accuracy-based metrics like mean-squared error. This loss function targets the regret achieved when taking a suboptimal decision; because the regret is generally not differentiable, we propose a differentiable "soft" regret term that allows the use of neural networks and other flexible machine learning models dependent on gradient-based training. In the particular case of paired data, we show theoretically that optimizing our loss function yields asymptotically optimal regret within the class of supervised learning models. We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare compared to benchmark methods from contextual bandits and conditional average treatment effect estimation.
Paper Structure (19 sections, 2 theorems, 36 equations, 2 figures, 3 tables)

This paper contains 19 sections, 2 theorems, 36 equations, 2 figures, 3 tables.

Key Result

Theorem 5.4

Define $\theta^*$ as the minimizer of the expected regret within $\Theta$, and define $\hat{\theta}_{ESR,n}$ as the minimizer of the ESR function over $n$ datapoints. Then for $k \geq n^{1/4} \log n$ and under the above assumptions, with probability exponentially going to 1.

Figures (2)

  • Figure 1: Dependence of ESR function on $k$, for fixed $w$ and $f(1,w) > f(0,w)$.
  • Figure 2: 95% CI for CTR of ESR loss-trained model vs. other methods over ten days.

Theorems & Definitions (4)

  • Theorem 5.4
  • Proposition 6.1
  • proof
  • proof