Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize
Sanket Shah, Andrew Perrault, Bryan Wilder, Milind Tambe
TL;DR
This paper tackles the challenge of making predictive models more decision-focused within Predict-then-Optimize by introducing Efficient Global Losses (EGLs). EGLs combine feature-based parameterization (FBP) to map prediction features to convex loss parameters with model-based sampling (MBS) to generate diverse, realistic training samples, addressing Fisher Consistency and sample efficiency. The authors prove that traditional weighted losses can fail Fisher Consistency in PtO, while a WeightedMSE variant under FBP recovers it for linear objectives. Empirically, EGLs achieve state-of-the-art results across four PtO domains with an order-of-magnitude fewer training samples and exhibit substantial robustness when the localness assumption is violated, highlighting the practical viability of decision-focused learning. The work also analyzes computational trade-offs, showing significant speedups driven by improved sample efficiency and targeted sampling strategies, making decision-focused training more accessible in practice.
Abstract
Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can the structure of a decision-making task be used to tailor ML models for that specific task?" To this end, recent work has proposed learning task-specific loss functions that capture this underlying structure. However, current approaches make restrictive assumptions about the form of these losses and their impact on ML model behavior. These assumptions both lead to approaches with high computational cost, and when they are violated in practice, poor performance. In this paper, we propose solutions to these issues, avoiding the aforementioned assumptions and utilizing the ML model's features to increase the sample efficiency of learning loss functions. We empirically show that our method achieves state-of-the-art results in four domains from the literature, often requiring an order of magnitude fewer samples than comparable methods from past work. Moreover, our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.
