Locally Linear Convergence for Nonsmooth Convex Optimization via Coupled Smoothing and Momentum
Reza Rahimi Baghbadorani, Sergio Grammatico, Peyman Mohajerin Esfahani
TL;DR
This work addresses the challenge of nonsmooth convex optimization when the objective is a sum of multiple nonsmooth terms. It introduces an adaptive smoothing technique whose smoothing parameter is coupled with momentum in a Nesterov-like accelerated method, enabling a global $O(1/\varepsilon)$ convergence rate and, under an $\infty$-locally strong convexity condition, a local linear convergence phase. The approach extends to composites with two prox-friendly nonsmooth terms and demonstrates strong empirical performance across Lasso, MaxCut SDP, Nuclear-norm minimization, and L1-MPC, revealing a practical transient $O(1/k^2)$ behavior before asymptotic linear convergence. Key insights include the two-phase convergence behavior arising from the smoothing rule and the importance of the initial smoothing level $\mu_0$ in achieving the linear phase. Overall, the coupling of smoothing and momentum yields fast, robust performance for a broad class of nonsmooth optimization problems with practical relevance in control and signal processing.
Abstract
We propose an adaptive accelerated smoothing technique for a nonsmooth convex optimization problem where the smoothing update rule is coupled with the momentum parameter. We also extend the setting to the case where the objective function is the sum of two nonsmooth functions. With regard to convergence rate, we provide the global (optimal) sublinear convergence guarantees of O(1/k), which is known to be provably optimal for the studied class of functions, along with a local linear rate if the nonsmooth term fulfills a so-call locally strong convexity condition. We validate the performance of our algorithm on several problem classes, including regression with the l1-norm (the Lasso problem), sparse semidefinite programming (the MaxCut problem), Nuclear norm minimization with application in model free fault diagnosis, and l_1-regularized model predictive control to showcase the benefits of the coupling. An interesting observation is that although our global convergence result guarantees O(1/k) convergence, we consistently observe a practical transient convergence rate of O(1/k^2), followed by asymptotic linear convergence as anticipated by the theoretical result. This two-phase behavior can also be explained in view of the proposed smoothing rule.
