Complexity Bounds for Smooth Multiobjective Optimization
Phillipe R. Sampaio
TL;DR
This work develops an information-based, oracle-complexity theory for finding $\varepsilon$-Pareto stationary points in smooth multiobjective optimization by introducing a robust non-degenerate lifting that embeds scalar hard instances into MOO with distinct objectives and non-singleton Pareto fronts. The lifting ensures the Pareto stationarity gap $\mathcal{G}(x)$ dominates the scalar gradient measure, enabling transfer of sharp single-objective lower bounds to the MOO setting. The authors establish tight linear lower bounds for strongly convex MOO, a separation between oblivious one-step and oblivious span methods in the convex regime, and universal lower bounds for adaptive methods via geometric arguments, as well as a nonconvex extension with Lipschitz gradients. They also provide upper bounds via scalarization and rate comparisons, showing that fixed-scalarization AGD achieves matching order rates, and discuss open questions on last-iterate behavior, stochastic settings, and broader extensions.
Abstract
We study the oracle complexity of finding $\varepsilon$-Pareto stationary points in smooth multiobjective optimization with $m$ objectives. Progress is measured by the Pareto stationarity gap $\mathcal{G}(x)$, the norm of the best convex combination of objective gradients. Our analysis relies on a non-degenerate lifting that embeds hard single-objective instances into MOO instances with distinct objectives and non-singleton Pareto fronts while preserving lower bounds on $\mathcal{G}$. We establish: (i) in the $μ$-strongly convex case, any span first-order method has worst-case linear convergence no faster than $\exp(-Θ(T/\sqrtκ))$ after $T$ oracle calls, yielding $Θ(\sqrtκ\log(1/\varepsilon))$ iterations and matching accelerated upper bounds; (ii) in the convex case, an $Ω(1/T)$ min-iterate lower bound for oblivious one-step methods and a universal last-iterate lower bound $Ω(1/T^2)$ for oblivious span methods via polynomial-degree arguments, and we further show this latter bound is loose (for general adaptive methods) by importing geometric lower bounds to obtain an $Ω(1/T)$ min-iterate lower bound for general adaptive first-order methods; (iii) in the nonconvex case with $L$-Lipschitz gradients, an $Ω(\sqrt{L}/(T+1))$-type lower bound on $\mathcal{G}$ (tight in order), implying $Ω(1/\varepsilon^2)$ iterations to reach $\mathcal{G}(x)\le\varepsilon$ up to natural scaling.
