Table of Contents
Fetching ...

Efficient and Modular Implicit Differentiation

Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

TL;DR

This work introduces automatic implicit differentiation, a modular framework that attaches to any solver by requiring a user-defined optimality-mapping F. By applying the implicit function theorem to F and differentiating via autodiff, it derives the Jacobian of the optimization solution without reimplementing solvers, unrolling iterations, or hand-deriving condition-specific formulas. The approach recovers known implicit differentiation schemes, enables new mappings, and provides Jacobian-precision guarantees, with empirical validation on bi-level optimization tasks and sensitivity analyses in molecular dynamics. The method significantly reduces implementation effort and enhances scalability across large, real-world problems like hyperparameter tuning, dataset distillation, and dictionary learning.

Abstract

Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, so far, implicit differentiation remained difficult to use for practitioners, as it often required case-by-case tedious mathematical derivations and implementations. In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines directly in Python a function $F$ capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of $F$ and the implicit function theorem to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many existing implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.

Efficient and Modular Implicit Differentiation

TL;DR

This work introduces automatic implicit differentiation, a modular framework that attaches to any solver by requiring a user-defined optimality-mapping F. By applying the implicit function theorem to F and differentiating via autodiff, it derives the Jacobian of the optimization solution without reimplementing solvers, unrolling iterations, or hand-deriving condition-specific formulas. The approach recovers known implicit differentiation schemes, enables new mappings, and provides Jacobian-precision guarantees, with empirical validation on bi-level optimization tasks and sensitivity analyses in molecular dynamics. The method significantly reduces implementation effort and enhances scalability across large, real-world problems like hyperparameter tuning, dataset distillation, and dictionary learning.

Abstract

Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, so far, implicit differentiation remained difficult to use for practitioners, as it often required case-by-case tedious mathematical derivations and implementations. In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines directly in Python a function capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of and the implicit function theorem to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many existing implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.

Paper Structure

This paper contains 62 sections, 4 theorems, 77 equations, 17 figures, 2 tables.

Key Result

Theorem 1

Let $F:{\mathbb R}^d \times {\mathbb R}^n \to {\mathbb R}^d$ be continuously differentiable. If there are $\alpha, \beta, \gamma, \varepsilon, R>0$ s.t. $A = -\partial_1 F$ and $B = \partial_2 F$ satisfy, for all $v\in{\mathbb R}^d$, $\theta \in {\mathbb R}^n$ and $x$ s.t. $\|x - x^\star(\theta)\| \

Figures (17)

  • Figure 1: Adding implicit differentiation on top of a ridge regression solver. The function $f(x, \theta)$ defines the objective function and the mapping $F$, here simply equation \ref{['eq:stationary_cond']}, captures the optimality conditions. Our decorator @custom_root automatically adds implicit differentiation to the solver for the user, overriding JAX's default behavior. The last line evaluates the Jacobian at $\theta = 10$.
  • Figure 2: Implementation of the proximal gradient fixed point \ref{['eq:proximal_grad_fp']} with step size $\eta=1$.
  • Figure 3: Jacobian estimate errors. Empirical error of implicit differentiation follows closely the theoretical upper bound. Unrolling achieves a much worse error for comparable iterate error.
  • Figure 4: CPU runtime comparison of implicit differentiation and unrolling for hyperparameter optimization of multiclass SVMs for multiple problem sizes. Error bars represent 90% confidence intervals. (a) Mirror descent (MD) solver, with MD fixed point for differentiation. (b) Proximal gradient (PG) solver, with PG fixed point for differentiation. (c) Block coordinate descent solver; for implicit differentiation we obtain $x^\star(\theta)$ by BCD but perform differentiation with the MD and PG fixed points. This shows that the solver and fixed point can be independently chosen.
  • Figure 5: Distilled dataset $\theta \in {\mathbb R}^{k \times p}$ obtained by solving \ref{['eq:bilevel_distillation']}.
  • ...and 12 more figures

Theorems & Definitions (9)

  • Definition 1
  • Theorem 1
  • proof : Proof of Theorem \ref{['thm:jacob']}
  • Corollary 1: Jacobian precision for gradient descent fixed point
  • proof : Proof of Corollary \ref{['cor:precision-gd']}
  • Corollary 2: Jacobian precision for proximal gradient descent fixed point
  • proof : Proof of Corollary \ref{['cor:precision-prox']}
  • Theorem 2
  • proof