Table of Contents
Fetching ...

Gradient sampling algorithm for subsmooth functions

Dimitris Boskos, Jorge Cortés, Sonia Martínez

TL;DR

The paper studies non-smooth optimization where the objective is the max over a parameterized family, $f(x)=\max_{\theta\in\Theta}F(x,\theta)$, with inner maximization preventing closed-form objective values and gradients.It extends gradient sampling to a modified gradient sampling (mGS) framework that uses an oracle to approximate inner maximizers and samples nearby gradients to form a descent direction, proving almost-sure convergence to Clarke stationary points under the weaker assumption that $f$ is lower-$\mathcal{C}^2$ (subsmooth) on an open full-measure set.The authors provide convergence proofs, invariance guarantees to convex domains, and a distributionally robust coverage optimization example showing that the objective is lower-$\mathcal{C}^2$ and that iterates can be guided toward a desired convex set without adding hard constraints.Numerical experiments demonstrate robustness to density uncertainty, with two- and multi-agent setups showing convergence where standard gradient methods may fail, and a penalty term ensuring attractivity inside a convex region.

Abstract

This paper considers non-smooth optimization problems where we seek to minimize the pointwise maximum of a continuously parameterized family of functions. Since the objective function is given as the solution to a maximization problem, neither its values nor its gradients are available in closed form, which calls for approximation. Our approach hinges upon extending the so-called gradient sampling algorithm, which approximates the Clarke generalized gradient of the objective function at a point by sampling its derivative at nearby locations. This allows us to select descent directions around points where the function may fail to be differentiable and establish algorithm convergence to a stationary point from any initial condition. Our key contribution is to prove this convergence by alleviating the requirement on continuous differentiability of the objective function on an open set of full measure. We further provide assumptions under which a desired convex subset of the decision space is rendered attractive for the iterates of the algorithm.

Gradient sampling algorithm for subsmooth functions

TL;DR

The paper studies non-smooth optimization where the objective is the max over a parameterized family, $f(x)=\max_{\theta\in\Theta}F(x,\theta)$, with inner maximization preventing closed-form objective values and gradients.It extends gradient sampling to a modified gradient sampling (mGS) framework that uses an oracle to approximate inner maximizers and samples nearby gradients to form a descent direction, proving almost-sure convergence to Clarke stationary points under the weaker assumption that $f$ is lower-$\mathcal{C}^2$ (subsmooth) on an open full-measure set.The authors provide convergence proofs, invariance guarantees to convex domains, and a distributionally robust coverage optimization example showing that the objective is lower-$\mathcal{C}^2$ and that iterates can be guided toward a desired convex set without adding hard constraints.Numerical experiments demonstrate robustness to density uncertainty, with two- and multi-agent setups showing convergence where standard gradient methods may fail, and a penalty term ensuring attractivity inside a convex region.

Abstract

This paper considers non-smooth optimization problems where we seek to minimize the pointwise maximum of a continuously parameterized family of functions. Since the objective function is given as the solution to a maximization problem, neither its values nor its gradients are available in closed form, which calls for approximation. Our approach hinges upon extending the so-called gradient sampling algorithm, which approximates the Clarke generalized gradient of the objective function at a point by sampling its derivative at nearby locations. This allows us to select descent directions around points where the function may fail to be differentiable and establish algorithm convergence to a stationary point from any initial condition. Our key contribution is to prove this convergence by alleviating the requirement on continuous differentiability of the objective function on an open set of full measure. We further provide assumptions under which a desired convex subset of the decision space is rendered attractive for the iterates of the algorithm.

Paper Structure

This paper contains 17 sections, 16 theorems, 143 equations, 3 figures.

Key Result

Proposition 2.1

(Subgradient/Clarke generalized gradients relationship RTR-RJBW:98). Assume that $f$ is locally Lipschitz on the open set $\mathcal{O}\subset \mathbb{R}^{n}$. Then, for all $x\in\mathcal{O}$. $\blacksquare$

Figures (3)

  • Figure 1: (a) The part of the two-agent domain for which we obtained an explicit formula of the coverage cost and the lines where this cost may be discontinuous, given by $p_1(x)=p_2(x)$. The green line $x_1+x_2-4=0$, on which $p_1(x)-p_2(x)=0$, distinguishes whether we are in Case (i) (below the line) or Case (ii) (above the line), respectively, whereas the yellow lines determine the additional zeros of $p_1(x)-p_2(x)$ for each respective case (plotted dashed in its complement). (b) shows a plot of this cost for the histogram value bounds $\theta_1^-=\theta_2^-=0$ and $\theta_1^+=\theta_2^+=0.45$.
  • Figure 2: (a) shows the agent position sequence generated by the mGS algorithm, which converges to the minimum of the cost function (b) shows the same sequence generated by a simple gradient descent, which fails to find a descent direction and proceed further after a certain number of iterations.
  • Figure 3: The plot shows the position sequence of five agents generated by the mGS algorithm. Here, the upper and lower bounds $\theta_k^-$, $\theta_k^+$, $k=1,\ldots,6$ are different. All agents are aligned vertically at each iteration and the two outmost ones lie initially outside the support $[0,6]$ of the uncertain density. The penalty term $F_{\rm penalty}$ ensures that the agents converge to an optimal worst-case configuration that lies inside the interval $[0,6]$.

Theorems & Definitions (33)

  • Proposition 2.1
  • Definition 2.2
  • Definition 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Proposition 2.6
  • Remark 3.1
  • Remark 3.4
  • Theorem 3.5
  • Proposition 3.6
  • ...and 23 more