Table of Contents
Fetching ...

The bilinear Hessian for large scale optimization

Marcus Carlsson, Viktor Nikitin, Erik Troedsson, Herwig Wendt

TL;DR

The paper addresses the prohibitive storage and computation of the Hessian in large-scale smooth optimization by introducing a basis-free bilinear Hessian and a Hessian operator that act directly on the space ${\mathbb X}$. It develops Taylor-based tools to obtain first- and second-order information without forming the Hessian, and shows how to implement Newton, Conjugate Gradient, and Quasi-Newton methods in a matrix-free fashion. A Poisson-noise imaging example demonstrates explicit gradient and Hessian formulations and reports substantial speedups over automatic differentiation, along with improved convergence and stability in conjugate gradient steps. The approach enables scalable second-order optimization for high-dimensional problems and has potential applications in imaging, ptychography, and machine learning, with local convergence guarantees for the gradient-descent variant.

Abstract

Second order information is useful in many ways in smooth optimization problems, including for the design of step size rules and descent directions, or the analysis of the local properties of the objective functional. However, the computation and storage of the Hessian matrix using second order partial derivatives is prohibitive in many contexts, and in particular in large scale problems. In this work, we propose a new framework for computing and presenting second order information in analytic form. The key novel insight is that the Hessian for a problem can be worked with efficiently by computing its bilinear form or operator form using Taylor expansions, instead of introducing a basis and then computing the Hessian matrix. Our new framework is suited for high-dimensional problems stemming e.g. from imaging applications, where computation of the Hessian matrix is unfeasible. We also show how this can be used to implement Newton's step rule, Daniel's Conjugate Gradient rule, or Quasi-Newton schemes, without explicit knowledge of the Hessian matrix, and illustrate our findings with a simple numerical experiment.

The bilinear Hessian for large scale optimization

TL;DR

The paper addresses the prohibitive storage and computation of the Hessian in large-scale smooth optimization by introducing a basis-free bilinear Hessian and a Hessian operator that act directly on the space . It develops Taylor-based tools to obtain first- and second-order information without forming the Hessian, and shows how to implement Newton, Conjugate Gradient, and Quasi-Newton methods in a matrix-free fashion. A Poisson-noise imaging example demonstrates explicit gradient and Hessian formulations and reports substantial speedups over automatic differentiation, along with improved convergence and stability in conjugate gradient steps. The approach enables scalable second-order optimization for high-dimensional problems and has potential applications in imaging, ptychography, and machine learning, with local convergence guarantees for the gradient-descent variant.

Abstract

Second order information is useful in many ways in smooth optimization problems, including for the design of step size rules and descent directions, or the analysis of the local properties of the objective functional. However, the computation and storage of the Hessian matrix using second order partial derivatives is prohibitive in many contexts, and in particular in large scale problems. In this work, we propose a new framework for computing and presenting second order information in analytic form. The key novel insight is that the Hessian for a problem can be worked with efficiently by computing its bilinear form or operator form using Taylor expansions, instead of introducing a basis and then computing the Hessian matrix. Our new framework is suited for high-dimensional problems stemming e.g. from imaging applications, where computation of the Hessian matrix is unfeasible. We also show how this can be used to implement Newton's step rule, Daniel's Conjugate Gradient rule, or Quasi-Newton schemes, without explicit knowledge of the Hessian matrix, and illustrate our findings with a simple numerical experiment.

Paper Structure

This paper contains 17 sections, 4 theorems, 60 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{F}:{\mathbb X}\rightarrow {\mathbb Y}$ be a $C^2$-function, where ${\mathbb X}$ and ${\mathbb Y}$ are spaces as in Assumption def1. Then, for each fixed $x\in{\mathbb X}$, there exists a unique linear operator $d\mathcal{F}|_x:{\mathbb X}\rightarrow {\mathbb Y}$ and a unique bilinear s where $o$ denotes "little ordo".

Figures (3)

  • Figure 1: Given a fixed basis in ${\mathbb X}$ and ${\mathbb Y}$, all linear operators can be represented as matrices, and vice versa. For practical implementation, it is desirable to avoid matrix representations of linear operators (e.g., using FFT to evaluate Fourier transforms).
  • Figure 2: Illustration of step length computation using bilinear Hessian (BH-GD) for the example in Section \ref{['sec:poisson1']}.
  • Figure 3: Objective functional value vs. iteration number (top row) and vs. computation time (bottom row), respectively, for step length determined by quadratic approximation using the bilinear Hessian (left column) and by line search (right column, 50 objective function evaluations), respectively.

Theorems & Definitions (11)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof
  • Definition 3
  • Definition 4
  • Theorem 2
  • proof
  • Theorem 3
  • Proposition 1
  • ...and 1 more