You Shall Pass: Dealing with the Zero-Gradient Problem in Predict and Optimize for Convex Optimization

Grigorii Veviurko; Wendelin Böhmer; Mathijs de Weerdt

You Shall Pass: Dealing with the Zero-Gradient Problem in Predict and Optimize for Convex Optimization

Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt

TL;DR

The paper addresses the zero-gradient problem in predict-and-optimize for convex optimization by showing that active constraints can induce large null spaces in the Jacobian of the optimal solution with respect to parameters. It combines a quadratic-programming inner problem with local smoothing of the feasible set and projection-distance regularization to create a tractable, informative Jacobian that enables effective gradient-based training. The authors prove a zero-gradient theorem, derive a simple diagonal Jacobian for the QP inner problem, and show that local smoothing yields non-decreasing task performance with small updates; empirically, the Smoothed QP method outperforms existing approaches in non-linear cases and matches linear-method performance where appropriate. This approach broadens the applicability of differentiable optimization in P&O and offers a practical, scalable tool for convex problems, including portfolio optimization, with potential extensions to broader convex and bi-level settings.

Abstract

Predict and optimize is an increasingly popular decision-making paradigm that employs machine learning to predict unknown parameters of optimization problems. Instead of minimizing the prediction error of the parameters, it trains predictive models using task performance as a loss function. The key challenge to train such models is the computation of the Jacobian of the solution of the optimization problem with respect to its parameters. For linear problems, this Jacobian is known to be zero or undefined; hence, approximations are usually employed. For non-linear convex problems, however, it is common to use the exact Jacobian. This paper demonstrates that the zero-gradient problem appears in the non-linear case as well -- the Jacobian can have a sizeable null space, thereby causing the training process to get stuck in suboptimal points. Through formal proofs, this paper shows that smoothing the feasible set resolves this problem. Combining this insight with known techniques from the literature, such as quadratic programming approximation and projection distance regularization, a novel method to approximate the Jacobian is derived. In simulation experiments, the proposed method increases the performance in the non-linear case and at least matches the existing state-of-the-art methods for linear problems.

You Shall Pass: Dealing with the Zero-Gradient Problem in Predict and Optimize for Convex Optimization

TL;DR

Abstract

Paper Structure (21 sections, 5 theorems, 36 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 5 theorems, 36 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Predict and optimize
Related work
Problem formulation
Differentiable optimization
The zero-gradient theorem
Quadratic programming approximation
Local smoothing
The training process
Experiments
Portfolio optimization
Comparison to linear methods
Conclusion
Proofs
1. ${\Delta\hat{w}\in\mathcal{N}(\hat{x}).}$
...and 6 more sections

Key Result

Lemma 3.6

Let Assumptions 1-3 hold and let be the representation of the internal gradient with the normals of the active constraints. Then, suppose that the strict complementary slackness condition holds, i.e., $\alpha_i(\hat{w})>0,\,\forall i\in I(\hat{x}).$ Then, the Jacobian $\nabla_{\hat{w}}x^\ast(\hat{w})$ exists at $\hat{w}.$ Moreover,

Figures (5)

Figure 1: Gradient cones $\hat{x} + G(\hat{x})$ (orange cones) and internal gradients $\nabla_{x}f(\hat{x},\hat{w})$ (black arrows) at different points $\hat{x}$ (red dots) in different feasible sets $\mathcal{C}$ (blue cube and cylinder). The points $\hat{x}$ can not be moved in the dimensions spanned by the cones.
Figure 2: Left: Illustration of the QP approximation. The internal gradient (black arrow) at the solution of the QP $\hat{x}$ (red point) is orthogonal to the feasible set $\mathcal{C}$ (blue area) and points towards the unconstrained maximum $\hat{w}$ (purple cross). Right: Illustration of the smoothed problem. The internal gradient (black arrow) is orthogonal to the smoothed feasible set $\mathcal{C}_r(\hat{x}, \hat{w})$ (green circle) at the decision $\hat{x}$ (red point).
Figure 3: Results on the standard portfolio optimization problem: (a) the final test regret for each of the algorithms for varying $\lambda$'s. The smoothed QP performs the best. (b) Evolution of the $l_2$ norm of the gradient during training for $\lambda=0.1$ and $\lambda=1$. Unlike standalone QP and differentiation of the true problem, smoothed QP does not have the zero-gradient issue.
Figure 4: The final test regret of linear P&O algorithms, QP approximation, and smoothed QP on four benchmark problems.
Figure 5: Example of randomly generated grid topology. Red triangles represent generator nodes, and purple squares represent loads.

Theorems & Definitions (10)

Definition 3.4
Lemma 3.6: Theorem 2.1 in fiacco1976sensitivity
Lemma 3.7
Theorem 3.8: Zero-gradient theorem
Lemma 3.9
Definition 3.10
Theorem 3.12
proof : Proof of Lemma 4
proof : Proof of Lemma 6
proof : Proof of Theorem 9

You Shall Pass: Dealing with the Zero-Gradient Problem in Predict and Optimize for Convex Optimization

TL;DR

Abstract

You Shall Pass: Dealing with the Zero-Gradient Problem in Predict and Optimize for Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (10)