Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems
Pierre-Cyril Aubin-Frankowski, Yohann De Castro, Axel Parmentier, Alessandro Rudi
TL;DR
The paper tackles generalization of surrogate policies for combinatorial optimization by introducing smoothing via Gaussian perturbations to a linear-optimization surrogate, enabling differentiable risk and gradient-based learning. It develops a decomposition of the excess risk into perturbation bias, estimation error, and optimization error, and introduces the Uniform Weak (UW) moment property to quantify how the statistic model interacts with the normal-cone structure of the feasible polytope. The theory shows that, under mild conditions and with positive regularization $\varepsilon_0>0$, the UW property holds and the excess risk can be controlled, with explicit bias-variance-approximation tradeoffs characterized as functions of the perturbation scale $\lambda$, sample size $n$, and optimization complexity $M$ (via Kernel-SoS). The framework applies to contextual stochastic optimization and industrially relevant problems like stochastic vehicle scheduling, where smoothing enables tractable training and controlled generalization, while maintaining a tractable inference via the linear oracle. Overall, the work provides non-asymptotic guarantees and a principled guidance for choosing perturbation levels to balance training efficiency and generalization.
Abstract
A recent line of structured learning methods has advanced the practical state-of-the-art for combinatorial optimization problems with complex, application-specific objectives. These approaches learn policies that couple a statistical model with a tractable surrogate combinatorial optimization oracle, so as to exploit the distribution of problem instances instead of solving each instance independently. A core obstacle is that the empirical risk is then piecewise constant in the model parameters. This hinders gradient-based optimization and only few theoretical guarantees have been provided so far. We address this issue by analyzing smoothed (perturbed) policies: adding controlled random perturbations to the direction used by the linear oracle yields a differentiable surrogate risk and improves generalization. Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error. The analysis hinges on a new Uniform Weak (UW) property capturing the geometric interaction between the statistical model and the normal fan of the feasible polytope; we show it holds under mild assumptions, and automatically when a minimal baseline perturbation is present. The framework covers, in particular, contextual stochastic optimization. We illustrate the scope of the results on applications such as stochastic vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
