Randomized block proximal method with locally Lipschitz continuous gradient
Pedro Pérez-Aros, David Torregrosa-Belén
TL;DR
This work addresses large-scale nonconvex optimization of the form $\min_x \varphi(x)=f(x)+g(x)$ with $f$ differentiable and $g$ block-separable, under only blockwise locally Lipschitz gradients for $f$ rather than global Lipschitz continuity. It introduces the Adaptive Randomized Block Proximal Gradient (ARBPG) method, which randomly selects blocks and uses an adaptive proximal stepsize to guarantee descent, optionally augmented by a boosted linesearch, without prior knowledge of local Lipschitz constants. The authors prove subsequential convergence to stationary points almost surely and establish a positive lower bound on stepsizes on bounded subsequences, ensuring progress. Numerical experiments on nonnegative matrix factorization for image compression demonstrate competitive performance and robustness to the relaxed gradient regularity assumptions, with code and data available publicly.
Abstract
Block-coordinate algorithms are recognized to furnish efficient iterative schemes for addressing large-scale problems, especially when the computation of full derivatives entails substantial memory requirements and computational efforts. In this paper, we investigate a randomized block proximal gradient algorithm for minimizing the sum of a differentiable function and a separable proper lower-semicontinuous function, both possibly nonconvex. In contrast to previous works, we only assume that the partial gradients of the differentiable function are locally Lipschitz continuous. At each iteration, the method adaptively selects a proximal stepsize to satisfy a sufficient decrease condition without prior knowledge of the local Lipschitz moduli of the partial gradients of the differentiable function. In addition, we incorporate the possibility of conducting an additional linesearch to enhance the performance of the algorithm. Our main result establishes subsequential convergence to a stationary point of the problem almost surely. Finally, we provide numerical validation of the method in an experiment in image compression using a nonnegative matrix factorization model.
