The bilinear Hessian for large scale optimization
Marcus Carlsson, Viktor Nikitin, Erik Troedsson, Herwig Wendt
TL;DR
The paper addresses the prohibitive storage and computation of the Hessian in large-scale smooth optimization by introducing a basis-free bilinear Hessian and a Hessian operator that act directly on the space ${\mathbb X}$. It develops Taylor-based tools to obtain first- and second-order information without forming the Hessian, and shows how to implement Newton, Conjugate Gradient, and Quasi-Newton methods in a matrix-free fashion. A Poisson-noise imaging example demonstrates explicit gradient and Hessian formulations and reports substantial speedups over automatic differentiation, along with improved convergence and stability in conjugate gradient steps. The approach enables scalable second-order optimization for high-dimensional problems and has potential applications in imaging, ptychography, and machine learning, with local convergence guarantees for the gradient-descent variant.
Abstract
Second order information is useful in many ways in smooth optimization problems, including for the design of step size rules and descent directions, or the analysis of the local properties of the objective functional. However, the computation and storage of the Hessian matrix using second order partial derivatives is prohibitive in many contexts, and in particular in large scale problems. In this work, we propose a new framework for computing and presenting second order information in analytic form. The key novel insight is that the Hessian for a problem can be worked with efficiently by computing its bilinear form or operator form using Taylor expansions, instead of introducing a basis and then computing the Hessian matrix. Our new framework is suited for high-dimensional problems stemming e.g. from imaging applications, where computation of the Hessian matrix is unfeasible. We also show how this can be used to implement Newton's step rule, Daniel's Conjugate Gradient rule, or Quasi-Newton schemes, without explicit knowledge of the Hessian matrix, and illustrate our findings with a simple numerical experiment.
