A Normal Map-Based Proximal Stochastic Gradient Method: Convergence and Identification Properties
Junwen Qiu, Li Jiang, Andre Milzarek
TL;DR
This work introduces NSGD, a normal-map-based proximal stochastic gradient method, to solve nonconvex composite problems without variance reduction. By leveraging Robinson's normal map and a carefully designed merit function, NSGD achieves global convergence to stationary points a.s., provides nonasymptotic complexity bounds comparable to PSGD, and attains finite-time manifold identification under mild definability assumptions via a Kurdyka-Łojasiewicz framework. The approach yields stronger identification properties than standard Prox-SGD, while maintaining similar computational costs per iteration. Theoretical results are complemented by numerical experiments in nonconvex classification and sparse+low-rank decomposition, where NSGD exhibits improved sparsity, lower rank, and faster convergence. Overall, the normal-map perspective offers a robust, variance-reduction-free pathway to convergence and identification in stochastic nonconvex optimization.
Abstract
The proximal stochastic gradient method (PSGD) is one of the state-of-the-art approaches for stochastic composite-type problems. In contrast to its deterministic counterpart, PSGD has been found to have difficulties with the correct identification of underlying substructures (such as supports, low rank patterns, or active constraints) and it does not possess a finite-time manifold identification property. Existing solutions rely on convexity assumptions or on the additional usage of variance reduction techniques. In this paper, we address these limitations and present a simple variant of PSGD based on Robinson's normal map. The proposed normal map-based proximal stochastic gradient method (NSGD) is shown to converge globally, i.e., accumulation points of the generated iterates correspond to stationary points almost surely. In addition, we establish complexity bounds for NSGD that match the known results for PSGD and we prove that NSGD can almost surely identify active manifolds in finite-time in a general nonconvex setting. Our derivations are built on almost sure iterate convergence guarantees and utilize analysis techniques based on the Kurdyka-Lojasiewicz inequality.
