Block majorization-minimization with diminishing radius for constrained nonsmooth nonconvex optimization
Hanbaek Lyu, Yuchen Li
TL;DR
This work develops Block Majorization-Minimization (BMM) for constrained nonsmooth nonconvex optimization by cyclically minimizing majorizing surrogates in each block. It introduces BMM-DR, a diminishing-radius trust-region variant, and proves that the iteration complexity is $\widetilde{O}((1+L_g+\rho^{-1})\varepsilon^{-2})$ for standard BMM and improves to $\widetilde{O}((1+L_g)\varepsilon^{-2})$ with DR, removing dependence on $\rho^{-1}$; asymptotic convergence to stationary-Nash points holds under mild assumptions and tolerance to inexact subproblem solutions. The theory is instantiated for practical problems including Nonnegative Matrix Factorization (NMF) and constrained tensor factorization (CP/NCPD), yielding concrete results for algorithms like MU/MUR and ALS-type methods, and for Block Projected Gradient Descent (BPGD). Numerical experiments show that diminishing-radius strategies can accelerate convergence, particularly with nearly-flat surrogates or ill-conditioned problems, while maintaining convergence guarantees. Overall, the paper provides new global rates and robustness results for BMM variants, guiding the design of efficient constrained nonconvex solvers in matrix and tensor factorization and related domains.
Abstract
Block majorization-minimization (BMM) is a simple iterative algorithm for constrained nonconvex optimization that sequentially minimizes majorizing surrogates of the objective function in each block while the others are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We first establish that for general constrained nonsmooth nonconvex optimization, BMM with $ρ$-strongly convex and $L_g$-smooth surrogates can produce an $ε$-approximate first-order optimal point within $\widetilde{O}((1+L_g+ρ^{-1})ε^{-2})$ iterations and asymptotically converges to the set of first-order optimal points. Next, we show that BMM combined with trust-region methods with diminishing radius has an improved complexity of $\widetilde{O}((1+L_g) ε^{-2})$, independent of the inverse strong convexity parameter $ρ^{-1}$, allowing improved theoretical and practical performance with `flat' surrogates. Our results hold robustly even when the convex sub-problems are solved as long as the optimality gaps are summable. Central to our analysis is a novel continuous first-order optimality measure, by which we bound the worst-case sub-optimality in each iteration by the first-order improvement the algorithm makes. We apply our general framework to obtain new results on various algorithms such as the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung, regularized nonnegative tensor decomposition, and the classical block projected gradient descent algorithm. Lastly, we numerically demonstrate that the additional use of diminishing radius can improve the convergence rate of BMM in many instances.
