Table of Contents
Fetching ...

Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates

Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

TL;DR

This work addresses computing derivatives through fixed-point maps when the map $\Phi$ is nondifferentiable but piecewise smooth and contractive. It develops two deterministic methods, ITD and AID-FP, with non-asymptotic linear convergence guarantees and improved rates compared to prior work, while showing AID-FP often outperforms ITD. It then introduces NSID, a stochastic implicit-differentiation approach for compositional fixed-point problems with inner stochastic estimators, proving an $O(1/k)$ rate for the Jacobian-vector product approximation. The framework is applied to bilevel optimization, yielding deterministic and stochastic rate results and enabling scalable, reliable gradient information through nonsmooth fixed-point equations; experiments on elastic-net and data-poisoning tasks validate the theoretical insights and demonstrate NSID’s practical value.

Abstract

We study the problem of efficiently computing the derivative of the fixed-point of a parametric nondifferentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. We analyze two popular approaches: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge behind the nonsmooth setting is that the chain rule does not hold anymore. We build upon the work by Bolte et al. (2022), who prove linear convergence of nonsmooth ITD under a piecewise Lipschitz smooth assumption. In the deterministic case, we provide a linear rate for AID and an improved linear rate for ITD which closely match the ones for the smooth setting. We further introduce NSID, a new stochastic method to compute the implicit derivative when the contraction map is defined as the composition of an outer map and an inner map which is accessible only through a stochastic unbiased estimator. We establish rates for the convergence of NSID, encompassing the best available rates in the smooth setting. We also present illustrative experiments confirming our analysis.

Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates

TL;DR

This work addresses computing derivatives through fixed-point maps when the map is nondifferentiable but piecewise smooth and contractive. It develops two deterministic methods, ITD and AID-FP, with non-asymptotic linear convergence guarantees and improved rates compared to prior work, while showing AID-FP often outperforms ITD. It then introduces NSID, a stochastic implicit-differentiation approach for compositional fixed-point problems with inner stochastic estimators, proving an rate for the Jacobian-vector product approximation. The framework is applied to bilevel optimization, yielding deterministic and stochastic rate results and enabling scalable, reliable gradient information through nonsmooth fixed-point equations; experiments on elastic-net and data-poisoning tasks validate the theoretical insights and demonstrate NSID’s practical value.

Abstract

We study the problem of efficiently computing the derivative of the fixed-point of a parametric nondifferentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. We analyze two popular approaches: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge behind the nonsmooth setting is that the chain rule does not hold anymore. We build upon the work by Bolte et al. (2022), who prove linear convergence of nonsmooth ITD under a piecewise Lipschitz smooth assumption. In the deterministic case, we provide a linear rate for AID and an improved linear rate for ITD which closely match the ones for the smooth setting. We further introduce NSID, a new stochastic method to compute the implicit derivative when the contraction map is defined as the composition of an outer map and an inner map which is accessible only through a stochastic unbiased estimator. We establish rates for the convergence of NSID, encompassing the best available rates in the smooth setting. We also present illustrative experiments confirming our analysis.
Paper Structure (40 sections, 15 theorems, 126 equations, 2 figures, 3 algorithms)

This paper contains 40 sections, 15 theorems, 126 equations, 2 figures, 3 algorithms.

Key Result

Theorem 2.4

Let $F\colon U\subset\mathbb{R}^p \to \mathbb{R}^d$ be a continuous selection of definable and continuously differentiable mappings $F_1, \dots, F_r\colon U \to \mathbb{R}^d$. Then $F$ is definable if and only if $I_F\colon\mathbb{R}^p\rightrightarrows [r]$ is definable, and in such case $D_{F}^s$ i

Figures (2)

  • Figure 1: AID vs ITD for synthetic elastic-net. $t$ corresponds to the number of steps to find an approximate fixed point and the dashed vertical line is the step where the support is identified. AID-FP converges faster than ITD; note that after support identification there is a wide gap between the methods, as anticipated by our theoretical bounds. AID-CG does not converge in plot on the right, probably due to sensitivity to numerical errors.
  • Figure 2: Stochastic implicit differentiation for elastic net (left) and data poisoning (right) with constant (const) and decreasing (dec) step sizes. Mean (solid line) and the geometric standard deviation (shaded region) of the approximation error over 10 runs. SID does not converge on elastic net for this specific choice of $\lambda$ and diverges in data poisoning (hence we do not report it), while NSID converges faster (at the beginning) than the deterministic AID-FP. Note that decreasing step-sizes provide a favorable choice.

Theorems & Definitions (30)

  • Definition 2.1: Conservative Derivatives
  • Definition 2.2: Excess
  • Definition 2.3
  • Theorem 2.4
  • Lemma 2.5
  • Lemma 3.2
  • Remark 3.3
  • Theorem 4.1: nonsmooth ITD and AID-FP Rates
  • Remark 5.3
  • Theorem 5.5
  • ...and 20 more