A Universally Optimal Primal-Dual Method for Minimizing Heterogeneous Compositions
Aaron Zoll, Benjamin Grimmer
TL;DR
This work addresses optimizing convex composite objectives of the form $F(x)=g_0(x)+h(g_1(x),\dots,g_m(x))+u(x)$ where the component functions $g_j$ can vary from smooth to nonsmooth and from convex to strongly convex. It introduces a universal primal-dual method, UFCM, and its restarted variant R-UFCM, along with new aggregate curvature notions $L_{\varepsilon,r}^{\mathtt{ADA}}$ and $\mu_{\varepsilon}^{\mathtt{ADA}}$ that merge heterogeneous structure into single parameters for analysis. The authors prove optimal first-order convergence rates across smooth, Hölder, and uniformly convex component settings, and show that restarting yields accelerated rates when uniform convexity is present; they also connect to functionally constrained optimization and recover known results as special cases. The framework relies on a generalized Q-analysis in an extended Lagrangian, sliding techniques to decouple gradient and proximal computations, and universal Hölder-smoothness arguments, enabling practical black-box applicability to a wide class of convex problems. Overall, the results provide a principled, scalable approach for minimizing heterogeneous compositions with provably optimal complexity, offering a unified lens for constrained, finite-sum, and robust formulations in optimization.
Abstract
This paper proposes a universal algorithm for convex minimization problems of the composite form $g_0(x)+h(g_1(x),\dots, g_m(x)) + u(x)$. We allow each $g_j$ to independently range from being nonsmooth Lipschitz to smooth, from convex to strongly convex, described by notions of Hölder continuous gradients and uniform convexity. Note that, although the objective is built from a heterogeneous combination of such structured components, it does not necessarily possess smoothness, Lipschitzness, or any favorable structure overall other than convexity. Regardless, we provide a universal optimal method in terms of oracle access to (sub)gradients of each $g_j$. The key insight enabling our optimal universal analysis and a core technical contribution is the construction of two new constants, the Approximate Dualized Aggregate smoothness and strong convexity, which combine the benefits of each heterogeneous structure into single quantities amenable to analysis. As a key application, fixing $h$ as the nonpositive indicator function, this model readily captures functionally constrained minimization $g_0(x)+u(x)$ subject to $g_j(x)\leq 0$. In particular, our algorithm and analysis are directly inspired by the smooth constrained minimization method of Zhang and Lan and consequently recover and generalize their accelerated guarantees.
