Quasi-Newton methods for minimizing a quadratic function subject to uncertainty
Shen Peng, Gianpiero Canessa, David Ek, Anders Forsgren
TL;DR
The paper addresses minimizing a strictly convex quadratic with noisy gradient evaluations using quasi-Newton methods, focusing on low-rank updates such as memoryless BFGS and a symmetric-CG variant. It develops a chance-constrained framework to compute robust search directions under noise, and reformulates the low-rank updates to a one-parameter optimization via Sherman-Morrison, enabling deterministic or sample-average-approximation solutions. Computational results on random problems and CUTEst benchmarks show that chance-constrained quasi-Newton methods are robust to gradient noise and often outperform traditional methods, with memoryless BFGS outperforming BFGS in large-noise scenarios at times. The work highlights a trade-off between robustness/accuracy and computational cost, and suggests promising directions for extending robust quasi-Newton methods to broader stochastic optimization contexts.
Abstract
We investigate quasi-Newton methods for minimizing a strictly convex quadratic function which is subject to errors in the evaluation of the gradients. The methods all give identical behavior in exact arithmetic, generating minimizers of Krylov subspaces of increasing dimensions, thereby having finite termination. A BFGS quasi-Newton method is empirically known to behave very well on a quadratic problem subject to small errors. We also investigate large-error scenarios, in which the expected behavior is not so clear. In particular, we are interested in the behavior of quasi-Newton matrices that differ from the identity by a low-rank matrix, such as a memoryless BFGS method. Our numerical results indicate that for large errors, a memory-less quasi-Newton method often outperforms a BFGS method. We also consider a more advanced model for generating search directions, based on solving a chance-constrained optimization problem. Our results indicate that such a model often gives a slight advantage in final accuracy, although the computational cost is significantly higher.
