Accelerated Proximal Gradient Methods in the affine-quadratic case: Strong convergence and limit identification
Walaa M. Moursi, Andrew Naguib, Viktor Pavlovic, Stephen A. Vavasis
TL;DR
This work analyzes accelerated proximal gradient methods in the affine-quadratic regime, where f is quadratic and g is the indicator of a closed affine subspace. By recasting APG updates in terms of an affine nonexpansive operator, the authors establish that the APG limit coincides with the best-approximation projection of the starting point onto the solution set, and that the difference between APG and PGM iterates vanishes weakly. Under mild conditions on the momentum parameters, strong convergence follows, and the results are shown to be tight via a two-dimensional counterexample demonstrating non-coincidence outside the affine-quadratic setting. The paper also extends the analysis to cones and affine subspaces and provides numerical experiments illustrating limit identification in underdetermined image reconstruction problems. Overall, the findings clarify when APG shares the PGM limit and under what conditions it converges strongly, contributing to the understanding and practical deployment of accelerated methods in convex optimization.
Abstract
Recent works by Bot-Fadili-Nguyen (arXiv:2510.22715) and by Jang-Ryu (arXiv:2510.23513) resolve long-standing iterate convergence questions for accelerated (proximal) gradient methods. In particular, Bot-Fadili-Nguyen prove weak convergence of discrete accelerated gradient descent (AGD) iterates and, crucially, convergence of the accelerated proximal gradient (APG) method in the composite setting, with extensions to infinite-dimensional Hilbert spaces. In parallel, Jang-Ryu establish point convergence for the continuous-time accelerated flow and for discrete AGD in finite dimensions. These results leave open which minimizer is selected by the iterates. We answer this in the affine-quadratic setting: when initialized at the same point, the difference between the proximal gradient (PGM) and APG iterates converges weakly to zero. Consequently, APG converges weakly to the best approximation of the initial point in the solution set. Moreover, under mild assumptions on the parameter sequence, we obtain strong convergence of APG. The result is tight: a two-dimensional example shows that coincidence of the APG and PGM limits is specific to the affine-quadratic regime and does not hold in general.
