Table of Contents
Fetching ...

Differentiation of inertial methods for optimizing smooth parametric function

Jean-Jacques Godeme

TL;DR

This work analyzes how inertial optimization methods for smooth, strongly convex parametric problems can be differentiated with respect to a parameter $\theta$ using automatic differentiation. It establishes existence and uniqueness of the minimizer $x^*(\theta)$, proves global convergence and local linear rates for a broad class of inertial schemes, and derives explicit formulas for the derivative $\partial_{\theta}X^*(\theta)$ with convergence of the derivatives to the limit. A key contribution is the derivative-stability result, showing that $\partial_{\theta}X_k(\theta)$ converges to $\partial_{\theta}X^*(\theta)$ without requiring global Lipschitz bounds on second-order derivatives, and with a local linear rate for the derivative that includes a vanishing error term. The paper also provides numerical experiments on least-squares problems and a log-exponential model to illustrate state and derivative convergence and to validate the theoretical results. Overall, it offers a rigorous framework for differentiating inertial methods in parametric optimization, with broad implications for hyperparameter tuning and bilevel optimization in practice.

Abstract

In this paper, we consider the minimization of a $C^2-$smooth and strongly convex objective depending on a given parameter, which is usually found in many practical applications. We suppose that we desire to solve the problem with some inertial methods which cover a broader existing well-known inertial methods. Our main goal is to analyze the derivative of this algorithm as an infinite iterative process in the sense of ``automatic'' differentiation. This procedure is very common and has gain more attention recently. From a pure optimization perspective and under some mild premises, we show that any sequence generated by these inertial methods converge to the unique minimizer of the problem, which depends on the parameter. Moreover, we show a local linear convergence rate of the generated sequence. Concerning the differentiation of the scheme, we prove that the derivative of the sequence with respect to the parameter converges to the derivative of the limit of the sequence showing that any sequence is <<derivative stable>>. Finally, we investigate the rate at which the convergence occurs. We show that, this is locally linear with an error term tending to zero.

Differentiation of inertial methods for optimizing smooth parametric function

TL;DR

This work analyzes how inertial optimization methods for smooth, strongly convex parametric problems can be differentiated with respect to a parameter using automatic differentiation. It establishes existence and uniqueness of the minimizer , proves global convergence and local linear rates for a broad class of inertial schemes, and derives explicit formulas for the derivative with convergence of the derivatives to the limit. A key contribution is the derivative-stability result, showing that converges to without requiring global Lipschitz bounds on second-order derivatives, and with a local linear rate for the derivative that includes a vanishing error term. The paper also provides numerical experiments on least-squares problems and a log-exponential model to illustrate state and derivative convergence and to validate the theoretical results. Overall, it offers a rigorous framework for differentiating inertial methods in parametric optimization, with broad implications for hyperparameter tuning and bilevel optimization in practice.

Abstract

In this paper, we consider the minimization of a smooth and strongly convex objective depending on a given parameter, which is usually found in many practical applications. We suppose that we desire to solve the problem with some inertial methods which cover a broader existing well-known inertial methods. Our main goal is to analyze the derivative of this algorithm as an infinite iterative process in the sense of ``automatic'' differentiation. This procedure is very common and has gain more attention recently. From a pure optimization perspective and under some mild premises, we show that any sequence generated by these inertial methods converge to the unique minimizer of the problem, which depends on the parameter. Moreover, we show a local linear convergence rate of the generated sequence. Concerning the differentiation of the scheme, we prove that the derivative of the sequence with respect to the parameter converges to the derivative of the limit of the sequence showing that any sequence is <<derivative stable>>. Finally, we investigate the rate at which the convergence occurs. We show that, this is locally linear with an error term tending to zero.

Paper Structure

This paper contains 25 sections, 10 theorems, 36 equations, 1 figure, 1 algorithm.

Key Result

Lemma 2.2

Let us consider the problem (eq:param-optim) forall $\theta\in\Theta.$ Under the Premise prem_A, we have

Figures (1)

  • Figure 1: Automatic differentiation of the inertial methods (Case 1=inertial method) and (Case 2= Gradient descent).

Theorems & Definitions (23)

  • Remark 2.1
  • Lemma 2.2: Existence
  • Lemma 2.3
  • Remark 2.4
  • Proposition 2.5: Global convergence of the iterates
  • Remark 2.6
  • Remark 2.7
  • Proposition 2.8: Local linear convergence
  • Definition 3.1
  • Definition 3.2: Derivative stable
  • ...and 13 more