Table of Contents
Fetching ...

Remarks on the Polyak-Lojasiewicz inequality and the convergence of gradient systems

Arthur Castello B. de Oliveira, Leilei Cui, Eduardo D. Sontag

TL;DR

The paper develops a framework for generalizing the Polyak-Łojasiewicz inequality using nonlinear comparison functions and analyzes how these generalizations shape gradient-flow convergence. It then applies the framework to continuous-time LQR policy optimization, proving that CT-LQR cannot satisfy a global $\mathrm{PLI}$ and exhibits region-dependent convergence—bounded along high-gain trajectories yet unbounded near the stability boundary. A scalar LQR study clarifies dual convergence regimes: exponential-like convergence near the optimum and explosive sensitivity near the stability border. The results illuminate how weaker PL conditions govern convergence profiles and motivate future work on proximal-gradient methods with $L_1$ regularization. Overall, the work clarifies the limitations of global PL guarantees for CT-LQR and provides a nuanced view of gradient-flow dynamics under generalized PL inequalities.

Abstract

This work explores generalizations of the Polyak-Lojasiewicz inequality (PLI) and their implications for the convergence behavior of gradient flows in optimization problems. Motivated by the continuous-time linear quadratic regulator (CT-LQR) policy optimization problem -- where only a weaker version of the PLI is characterized in the literature -- this work shows that while weaker conditions are sufficient for global convergence to, and optimality of the set of critical points of the cost function, the "profile" of the gradient flow solution can change significantly depending on which "flavor" of inequality the cost satisfies. After a general theoretical analysis, we focus on fitting the CT-LQR policy optimization problem to the proposed framework, showing that, in fact, it can never satisfy a PLI in its strongest form. We follow up our analysis with a brief discussion on the difference between continuous- and discrete-time LQR policy optimization, and end the paper with some intuition on the extension of this framework to optimization problems with L1 regularization and solved through proximal gradient flows.

Remarks on the Polyak-Lojasiewicz inequality and the convergence of gradient systems

TL;DR

The paper develops a framework for generalizing the Polyak-Łojasiewicz inequality using nonlinear comparison functions and analyzes how these generalizations shape gradient-flow convergence. It then applies the framework to continuous-time LQR policy optimization, proving that CT-LQR cannot satisfy a global and exhibits region-dependent convergence—bounded along high-gain trajectories yet unbounded near the stability boundary. A scalar LQR study clarifies dual convergence regimes: exponential-like convergence near the optimum and explosive sensitivity near the stability border. The results illuminate how weaker PL conditions govern convergence profiles and motivate future work on proximal-gradient methods with regularization. Overall, the work clarifies the limitations of global PL guarantees for CT-LQR and provides a nuanced view of gradient-flow dynamics under generalized PL inequalities.

Abstract

This work explores generalizations of the Polyak-Lojasiewicz inequality (PLI) and their implications for the convergence behavior of gradient flows in optimization problems. Motivated by the continuous-time linear quadratic regulator (CT-LQR) policy optimization problem -- where only a weaker version of the PLI is characterized in the literature -- this work shows that while weaker conditions are sufficient for global convergence to, and optimality of the set of critical points of the cost function, the "profile" of the gradient flow solution can change significantly depending on which "flavor" of inequality the cost satisfies. After a general theoretical analysis, we focus on fitting the CT-LQR policy optimization problem to the proposed framework, showing that, in fact, it can never satisfy a PLI in its strongest form. We follow up our analysis with a brief discussion on the difference between continuous- and discrete-time LQR policy optimization, and end the paper with some intuition on the extension of this framework to optimization problems with L1 regularization and solved through proximal gradient flows.

Paper Structure

This paper contains 13 sections, 13 theorems, 90 equations, 5 figures.

Key Result

Lemma 1

For any proper function $f:\mathcal{X}\rightarrow\mathbb{R}^{}$, the solution of the gradient flow eq:gradflow initialized at any point $x\in\mathcal{X}$ is precompact.

Figures (5)

  • Figure 1: Diagram of the hierarchy between types of comparison functions and their relationship with the different types of Polyak-Ł ojasiewicz inequality. Notice in particular that while all functions that satisfy a PŁ I also have a class $\mathcal{K}_\infty$ lower-bound (as presented in definition \ref{['def:zoo']}), the converse is not necessarily true. Similarly, satisfying a sgPLI implies there exists a $\mathcal{PD}$ lower-bound, but the converse is not true. Also notice that the newly introduced class $\mathcal{K}_{\mathrm{SAT}}$ lower bound lies in between gPŁ I and sgPŁ I, and provides a better convergence guarantee. Finally, the $\ell$PŁ I stands isolated in the graph, but one should note that it sustains the same convergence and robustness properties than the gPŁ I, so long as the disturbance is small enough to keep the solution in a neighborhood of the optimum.
  • Figure 2: Gradient Flow trajectory for a cost $f$ that satisfies the conditions in Lemma \ref{['lem:linexp-bnd']}. Notice that for different given values of $\epsilon$, one can find a pair of line and exponential such that the line upper-bounds the solution while it is outside the levelset $\mathcal{X}_\epsilon$, and the exponential upper-bounds it when it is inside the level-set. Furthermore, notice that smaller values of $\epsilon$ result in a tighter bound for the exponential section, while also on a looser bound for the linear section, while the opposite occurs when $\epsilon$ is larger. Finally, the thin dotted blue line illustrates how the exponential bound for $\epsilon_1$ quickly becomes conservative if $t<\underline t$, pointing to linear-exponential as a tighter bound than purely exponential.
  • Figure 3: Illustration of the dual behavior of the LQR cost function \ref{['eq:costCTLQR']} fort he scalar case. In (a) we plot the squared norm of the gradient $\|\nabla J\|^2$ as a function of the feedback gain $k$, while in (b) we plot the largest exponential rate of convergence $m(k)$ as defined in \ref{['eq:me']}, and in both plots the dotted vertical line indicates the optimal $k^*$. Notice that for $k>k^*$ (right side of the dotted line) the gradient is bounded above, and $m(k)$ quickly goes to zero, while for $k<k^*$ (left side of the dotted line) the value of the gradient, and the best exponential rate of convergence both quickly diverge to infinity as $k$ approaches the border of instability.
  • Figure 4: Simulation results for the gradient flow of the scalar LQR policy optimization with $a=r=q=1$. Both simulations (a) and (b) were initialized such that $J(k(0))-J(k^*)\approx 8$, however (a) was initialized for $k(0)>k^*$ and (b) for $k(0)<k^*$. Notice that in (a) the convergence is "linear exponential" as described in Section \ref{['ssc:ConvPLI']} since, as can be seen in Fig. \ref{['fig:gradJscalar']}, for $k>k^*$, $\nabla J(k)$ is bounded above. In (b), on the other hand, the convergence is much quicker and exponential, due to the fact that for $k\in[a,k^*]$, the exponential rate of convergence $m(\epsilon)$ defined in \ref{['eq:me']} is bounded away from zero.
  • Figure 5: Visualization of how the discretization step affects the global exponential rate of convergence for the scalar discrete-time LQR policy optimization problem. Notice that for any discretization step $h>0$, there exists a $\mu=\underline{m_d}(h)$ that provides global exponential convergence guarantees to the solution, however, that rate of convergence goes to zero as $h$ goes to zero, which is compatible with the observation that CT LQR does not have a global exponential rate of convergence.

Theorems & Definitions (20)

  • Lemma 1
  • Theorem 1: Łojasiewicz's theorem lojasiewicz1984gradients
  • Lemma 2: de2024convergencede2024remarks
  • Definition 1: $\mu$-global Polyak-Ł ojasiewicz inequality
  • Definition 2: $\mu$-global exponential stability
  • Lemma 3
  • Definition 3: Semi-global PŁ I
  • Lemma 4
  • Definition 4: $\epsilon$-local PŁ I
  • Definition 5
  • ...and 10 more