Table of Contents
Fetching ...

Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems

Pierre-Cyril Aubin-Frankowski, Yohann De Castro, Axel Parmentier, Alessandro Rudi

TL;DR

The paper tackles generalization of surrogate policies for combinatorial optimization by introducing smoothing via Gaussian perturbations to a linear-optimization surrogate, enabling differentiable risk and gradient-based learning. It develops a decomposition of the excess risk into perturbation bias, estimation error, and optimization error, and introduces the Uniform Weak (UW) moment property to quantify how the statistic model interacts with the normal-cone structure of the feasible polytope. The theory shows that, under mild conditions and with positive regularization $\varepsilon_0>0$, the UW property holds and the excess risk can be controlled, with explicit bias-variance-approximation tradeoffs characterized as functions of the perturbation scale $\lambda$, sample size $n$, and optimization complexity $M$ (via Kernel-SoS). The framework applies to contextual stochastic optimization and industrially relevant problems like stochastic vehicle scheduling, where smoothing enables tractable training and controlled generalization, while maintaining a tractable inference via the linear oracle. Overall, the work provides non-asymptotic guarantees and a principled guidance for choosing perturbation levels to balance training efficiency and generalization.

Abstract

A recent line of structured learning methods has advanced the practical state-of-the-art for combinatorial optimization problems with complex, application-specific objectives. These approaches learn policies that couple a statistical model with a tractable surrogate combinatorial optimization oracle, so as to exploit the distribution of problem instances instead of solving each instance independently. A core obstacle is that the empirical risk is then piecewise constant in the model parameters. This hinders gradient-based optimization and only few theoretical guarantees have been provided so far. We address this issue by analyzing smoothed (perturbed) policies: adding controlled random perturbations to the direction used by the linear oracle yields a differentiable surrogate risk and improves generalization. Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error. The analysis hinges on a new Uniform Weak (UW) property capturing the geometric interaction between the statistical model and the normal fan of the feasible polytope; we show it holds under mild assumptions, and automatically when a minimal baseline perturbation is present. The framework covers, in particular, contextual stochastic optimization. We illustrate the scope of the results on applications such as stochastic vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.

Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems

TL;DR

The paper tackles generalization of surrogate policies for combinatorial optimization by introducing smoothing via Gaussian perturbations to a linear-optimization surrogate, enabling differentiable risk and gradient-based learning. It develops a decomposition of the excess risk into perturbation bias, estimation error, and optimization error, and introduces the Uniform Weak (UW) moment property to quantify how the statistic model interacts with the normal-cone structure of the feasible polytope. The theory shows that, under mild conditions and with positive regularization , the UW property holds and the excess risk can be controlled, with explicit bias-variance-approximation tradeoffs characterized as functions of the perturbation scale , sample size , and optimization complexity (via Kernel-SoS). The framework applies to contextual stochastic optimization and industrially relevant problems like stochastic vehicle scheduling, where smoothing enables tractable training and controlled generalization, while maintaining a tractable inference via the linear oracle. Overall, the work provides non-asymptotic guarantees and a principled guidance for choosing perturbation levels to balance training efficiency and generalization.

Abstract

A recent line of structured learning methods has advanced the practical state-of-the-art for combinatorial optimization problems with complex, application-specific objectives. These approaches learn policies that couple a statistical model with a tractable surrogate combinatorial optimization oracle, so as to exploit the distribution of problem instances instead of solving each instance independently. A core obstacle is that the empirical risk is then piecewise constant in the model parameters. This hinders gradient-based optimization and only few theoretical guarantees have been provided so far. We address this issue by analyzing smoothed (perturbed) policies: adding controlled random perturbations to the direction used by the linear oracle yields a differentiable surrogate risk and improves generalization. Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error. The analysis hinges on a new Uniform Weak (UW) property capturing the geometric interaction between the statistical model and the normal fan of the feasible polytope; we show it holds under mild assumptions, and automatically when a minimal baseline perturbation is present. The framework covers, in particular, contextual stochastic optimization. We illustrate the scope of the results on applications such as stochastic vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
Paper Structure (36 sections, 16 theorems, 95 equations, 3 figures)

This paper contains 36 sections, 16 theorems, 95 equations, 3 figures.

Key Result

Theorem 1

Under the conditions given in Section sec:conditions, the following holds true. Let $\varepsilon_0\geq 0$ and $\lambda>0$ be such that $\lambda\geq \varepsilon_0$. Let $\tau\in(0,1)$. There exists a constant $C>0$ that depends only on $\varepsilon_0$, $\tau$ and $f^0$ such that for any ${\bm{w}}\in\ where $s> {d_\mathcal{W}}/2$ is some tuning parameter on the order of regularity of the admissible

Figures (3)

  • Figure 1: Illustration of the stochastic vehicle scheduling policy.
  • Figure 2: Surrogate policy encoded by the statistical model ${\psi}_{\bm{w}}\,:\,{\bm{x}}\in\mathcal{X} \mapsto \bm\theta\in\mathds{R}^{d({\bm{x}})}$ with combinatorial optimization (CO) layer given by a linear program over solutions ${\bm{y}}\in\mathcal{Y}({\bm{x}})$.
  • Figure 3: Normal cone at point ${\bm{y}}_1$ to the polytope (left) and normal fan with internal radius $\rho$ at point $\bm\theta$ (right).

Theorems & Definitions (46)

  • Example 1
  • Example 2
  • Example 3
  • Definition 1: Surrogate policy
  • Remark 2: Generic case and an abuse of notation
  • Definition 3: Law of the perturbation
  • Remark 4: On forthcoming Gaussian assumption
  • Remark 5: Link with the internal radius
  • Definition 6: Perturbed surrogate policy probabilities
  • Remark 7
  • ...and 36 more