Table of Contents
Fetching ...

A Dual Perspective on Decision-Focused Learning: Scalable Training via Dual-Guided Surrogates

Paula Rodriguez-Diaz, Kirk Bansak Elisabeth Paulson

TL;DR

This paper tackles the scalability bottleneck of decision-focused learning in the Predict-then-Optimize framework by introducing Dual-Guided Loss (DGL), which uses dual variables $\boldsymbol{\lambda}$ from the downstream optimization to shape a differentiable surrogate, thereby avoiding solver calls for most gradient steps. DGL decouples optimization from learning by periodically refreshing duals and training on dual-adjusted targets with a softmax surrogate over groups, achieving substantial runtime savings. The authors prove an asymptotic regret bound showing decision regret vanishes as the temperature $\tau$ goes to zero, and provide a time-complexity analysis showing DGL reduces solver overhead compared to state-of-the-art baselines like SPO+ and QPTL. Empirical results on two combinatorial tasks (assignment-like many-to-one matching and weighted knapsack) demonstrate that DGL matches or exceeds baseline decision quality while delivering orders-of-magnitude faster training times and fewer solver calls, making decision-focused learning practical for larger-scale applications.

Abstract

Many real-world decisions are made under uncertainty by solving optimization problems using predicted quantities. This predict-then-optimize paradigm has motivated decision-focused learning, which trains models with awareness of how the optimizer uses predictions, improving the performance of downstream decisions. Despite its promise, scaling is challenging: state-of-the-art methods either differentiate through a solver or rely on task-specific surrogates, both of which require frequent and expensive calls to an optimizer, often a combinatorial one. In this paper, we leverage dual variables from the downstream problem to shape learning and introduce Dual-Guided Loss (DGL), a simple, scalable objective that preserves decision alignment while reducing solver dependence. We construct DGL specifically for combinatorial selection problems with natural one-of-many constraints, such as matching, knapsack, and shortest path. Our approach (a) decouples optimization from gradient updates by solving the downstream problem only periodically; (b) between refreshes, trains on dual-adjusted targets using simple differentiable surrogate losses; and (c) as refreshes become less frequent, drives training cost toward standard supervised learning while retaining strong decision alignment. We prove that DGL has asymptotically diminishing decision regret, analyze runtime complexity, and show on two problem classes that DGL matches or exceeds state-of-the-art DFL methods while using far fewer solver calls and substantially less training time. Code is available at https://github.com/paularodr/Dual-Guided-Learning.

A Dual Perspective on Decision-Focused Learning: Scalable Training via Dual-Guided Surrogates

TL;DR

This paper tackles the scalability bottleneck of decision-focused learning in the Predict-then-Optimize framework by introducing Dual-Guided Loss (DGL), which uses dual variables from the downstream optimization to shape a differentiable surrogate, thereby avoiding solver calls for most gradient steps. DGL decouples optimization from learning by periodically refreshing duals and training on dual-adjusted targets with a softmax surrogate over groups, achieving substantial runtime savings. The authors prove an asymptotic regret bound showing decision regret vanishes as the temperature goes to zero, and provide a time-complexity analysis showing DGL reduces solver overhead compared to state-of-the-art baselines like SPO+ and QPTL. Empirical results on two combinatorial tasks (assignment-like many-to-one matching and weighted knapsack) demonstrate that DGL matches or exceeds baseline decision quality while delivering orders-of-magnitude faster training times and fewer solver calls, making decision-focused learning practical for larger-scale applications.

Abstract

Many real-world decisions are made under uncertainty by solving optimization problems using predicted quantities. This predict-then-optimize paradigm has motivated decision-focused learning, which trains models with awareness of how the optimizer uses predictions, improving the performance of downstream decisions. Despite its promise, scaling is challenging: state-of-the-art methods either differentiate through a solver or rely on task-specific surrogates, both of which require frequent and expensive calls to an optimizer, often a combinatorial one. In this paper, we leverage dual variables from the downstream problem to shape learning and introduce Dual-Guided Loss (DGL), a simple, scalable objective that preserves decision alignment while reducing solver dependence. We construct DGL specifically for combinatorial selection problems with natural one-of-many constraints, such as matching, knapsack, and shortest path. Our approach (a) decouples optimization from gradient updates by solving the downstream problem only periodically; (b) between refreshes, trains on dual-adjusted targets using simple differentiable surrogate losses; and (c) as refreshes become less frequent, drives training cost toward standard supervised learning while retaining strong decision alignment. We prove that DGL has asymptotically diminishing decision regret, analyze runtime complexity, and show on two problem classes that DGL matches or exceeds state-of-the-art DFL methods while using far fewer solver calls and substantially less training time. Code is available at https://github.com/paularodr/Dual-Guided-Learning.

Paper Structure

This paper contains 35 sections, 4 theorems, 67 equations, 3 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Assume (A1)--(A3). Then for every $\tau>0$ and $\theta$, $\mathrm{Regret}(\theta) \;\le\; |\mathcal{G}|\,\bigl[\mathcal{L}_{\tau}(\theta, \hat{\boldsymbol{\lambda}})-\mathcal{L}_{\tau}^{*}\bigr] \;+\;O\!\bigl(e^{-\gamma/\tau}\bigr),$ where the $O\!\bigl(e^{-\gamma/\tau}\bigr)$ term is uniform in $\t

Figures (3)

  • Figure 1: Training Pipelines. (a) Two-stage: Training loss ignores decision quality. (b) DFL with solver-in-the-loop: train through an optimization oracle—either by differentiating through it (QPTL) or using solution-based gradients (SPO)—incurring a solve at every training step. (c) DFL with Dual-Guided Loss: periodically refresh duals $\mathbf{\lambda}$ from the downstream problem and, between refreshes, train on dual-adjusted soft decisions.
  • Figure 2: Training time vs test relative regret. Across Matching (sizes 10, 50) and Knapsack (sizes 24, 48), DGL variants reach competitive or better regret far faster than QPTL and SPO+.
  • Figure A1: Additional per-seed results for the Many-to-One Matching setting. Each panel corresponds to a distinct dataset generated with a different global seed.

Theorems & Definitions (7)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof