Table of Contents
Fetching ...

A General and Streamlined Differentiable Optimization Framework

Andrew W. Rosemberg, Joaquim Dias Garcia, François Pacaud, Robert B. Parker, Benoît Legat, Kaarthik Sundar, Russell Bent, Pascal Van Hentenryck

TL;DR

This work addresses differentiating through constrained optimization at scale by delivering a general, JuMP-native framework (DiffOpt.jl) that unifies modeling and differentiation. It differentiates the KKT system to provide forward- and reverse-mode sensitivities for smooth, potentially nonconvex problems, supported by a parameter-centric API that exposes derivatives with respect to named problem parameters. Key contributions include extending DiffOpt.jl to nonconvex NLPs, enabling objective sensitivities with respect to explicit parameters, and preserving seamless solver integration via JuMP/MathOptInterface. Demonstrations on energy dispatch, mean–variance portfolio optimization, and nonlinear robot inverse kinematics illustrate practical applicability to learning, calibration, and design within standard JuMP workflows.

Abstract

Differentiating through constrained optimization problems is increasingly central to learning, control, and large-scale decision-making systems, yet practical integration remains challenging due to solver specialization and interface mismatches. This paper presents a general and streamlined framework-an updated DiffOpt.jl-that unifies modeling and differentiation within the Julia optimization stack. The framework computes forward - and reverse-mode solution and objective sensitivities for smooth, potentially nonconvex programs by differentiating the KKT system under standard regularity assumptions. A first-class, JuMP-native parameter-centric API allows users to declare named parameters and obtain derivatives directly with respect to them - even when a parameter appears in multiple constraints and objectives - eliminating brittle bookkeeping from coefficient-level interfaces. We illustrate these capabilities on convex and nonconvex models, including economic dispatch, mean-variance portfolio selection with conic risk constraints, and nonlinear robot inverse kinematics. Two companion studies further demonstrate impact at scale: gradient-based iterative methods for strategic bidding in energy markets and Sobolev-style training of end-to-end optimization proxies using solver-accurate sensitivities. Together, these results demonstrate that differentiable optimization can be deployed as a routine tool for experimentation, learning, calibration, and design-without deviating from standard JuMP modeling practices and while retaining access to a broad ecosystem of solvers.

A General and Streamlined Differentiable Optimization Framework

TL;DR

This work addresses differentiating through constrained optimization at scale by delivering a general, JuMP-native framework (DiffOpt.jl) that unifies modeling and differentiation. It differentiates the KKT system to provide forward- and reverse-mode sensitivities for smooth, potentially nonconvex problems, supported by a parameter-centric API that exposes derivatives with respect to named problem parameters. Key contributions include extending DiffOpt.jl to nonconvex NLPs, enabling objective sensitivities with respect to explicit parameters, and preserving seamless solver integration via JuMP/MathOptInterface. Demonstrations on energy dispatch, mean–variance portfolio optimization, and nonlinear robot inverse kinematics illustrate practical applicability to learning, calibration, and design within standard JuMP workflows.

Abstract

Differentiating through constrained optimization problems is increasingly central to learning, control, and large-scale decision-making systems, yet practical integration remains challenging due to solver specialization and interface mismatches. This paper presents a general and streamlined framework-an updated DiffOpt.jl-that unifies modeling and differentiation within the Julia optimization stack. The framework computes forward - and reverse-mode solution and objective sensitivities for smooth, potentially nonconvex programs by differentiating the KKT system under standard regularity assumptions. A first-class, JuMP-native parameter-centric API allows users to declare named parameters and obtain derivatives directly with respect to them - even when a parameter appears in multiple constraints and objectives - eliminating brittle bookkeeping from coefficient-level interfaces. We illustrate these capabilities on convex and nonconvex models, including economic dispatch, mean-variance portfolio selection with conic risk constraints, and nonlinear robot inverse kinematics. Two companion studies further demonstrate impact at scale: gradient-based iterative methods for strategic bidding in energy markets and Sobolev-style training of end-to-end optimization proxies using solver-accurate sensitivities. Together, these results demonstrate that differentiable optimization can be deployed as a routine tool for experimentation, learning, calibration, and design-without deviating from standard JuMP modeling practices and while retaining access to a broad ecosystem of solvers.

Paper Structure

This paper contains 17 sections, 1 theorem, 18 equations, 4 figures.

Key Result

Theorem 1

Let $F : \mathbb{R}^{n} \times \mathbb{R}^{\ell} \rightarrow \mathbb{R}^{n}$ be a continuously differentiable function and let $\mathbf{y}^* \in \mathbb{R}^n, p_0 \in \mathbb{R}^\ell$ form a solution of the following system of $n$ equations: If the Jacobian matrix $\nabla_x F(\mathbf{y}^*, p_0)$ is invertible, then there exists a neighborhood $U$ of $p_0$ and a unique continuously differentiable

Figures (4)

  • Figure 1: Optimal dispatch (left), marginal electricity price $\lambda(d)=\partial J/\partial d$ (center, logarithmic scale), and forward sensitivities $\partial g_i/\partial d$ (right) for demands ranging from $\approx 0$ to 300MWh. The vertical dashed lines mark when the individual plant capacities are reached---$g_2$ reaches it's capacity at 148MWh and $g_1$ at 230MWh.
  • Figure 2: Impact of the risk limit $\sigma_{\max}$ on Markowitz portfolios. Left: predicted in-sample return versus realized out-of-sample return. Right: the out-of-sample loss $L(x)$ together with the absolute gradient $|\partial L/\partial\sigma_{\max}|$ obtained from DiffOpt.jl. The gradient tells the practitioner which way—and how aggressively—to adjust $\sigma_{\max}$ to reduce forecast error; its value is computed in one reverse-mode call without re-solving the optimization for perturbed risk limits.
  • Figure 3: Planned vs. Observed reality for a robot arm.
  • Figure 4: Left figure shows the spectral-norm heat-map $\bigl\lVert\partial\boldsymbol{\theta}/\partial(x,y)\bigr\rVert_2$ for a two-link arm - Bright rings mark near-singular poses. Right figure shows the normalized precision error of the first order approximation derived from calculated sensitivities.

Theorems & Definitions (1)

  • Theorem 1