Table of Contents
Fetching ...

An AD based library for Efficient Hessian and Hessian-Vector Product Computation on GPU

Desh Ranjan, Mohammad Zubair

TL;DR

The performance of CHESSFAD is evaluated for performing a large number of independent Hessian-Vector products on a set of standard test functions and its performance is compared to other existing header-based C++ libraries such as {\tt autodiff}.

Abstract

The Hessian-vector product computation appears in many scientific applications such as in optimization and finite element modeling. Often there is a need for computing Hessian-vector products at many data points concurrently. We propose an automatic differentiation (AD) based method, CHESSFAD (Chunked HESSian using Forward-mode AD), that is designed with efficient parallel computation of Hessian and Hessian-Vector products in mind. CHESSFAD computes second-order derivatives using forward mode and exposes parallelism at different levels that can be exploited on accelerators such as NVIDIA GPUs. In CHESSFAD approach, the computation of a row of the Hessian matrix is independent of the computation of other rows. Hence rows of the Hessian matrix can be computed concurrently. The second level of parallelism is exposed because CHESSFAD approach partitions the computation of a Hessian row into chunks, where different chunks can be computed concurrently. CHESSFAD is implemented as a lightweight header-based C++ library that works both for CPUs and GPUs. We evaluate the performance of CHESSFAD for performing a large number of independent Hessian-Vector products on a set of standard test functions and compare its performance to other existing header-based C++ libraries such as {\tt autodiff}. Our results show that CHESSFAD performs better than {\tt autodiff}, on all these functions with improvement ranging from 5-50\% on average.

An AD based library for Efficient Hessian and Hessian-Vector Product Computation on GPU

TL;DR

The performance of CHESSFAD is evaluated for performing a large number of independent Hessian-Vector products on a set of standard test functions and its performance is compared to other existing header-based C++ libraries such as {\tt autodiff}.

Abstract

The Hessian-vector product computation appears in many scientific applications such as in optimization and finite element modeling. Often there is a need for computing Hessian-vector products at many data points concurrently. We propose an automatic differentiation (AD) based method, CHESSFAD (Chunked HESSian using Forward-mode AD), that is designed with efficient parallel computation of Hessian and Hessian-Vector products in mind. CHESSFAD computes second-order derivatives using forward mode and exposes parallelism at different levels that can be exploited on accelerators such as NVIDIA GPUs. In CHESSFAD approach, the computation of a row of the Hessian matrix is independent of the computation of other rows. Hence rows of the Hessian matrix can be computed concurrently. The second level of parallelism is exposed because CHESSFAD approach partitions the computation of a Hessian row into chunks, where different chunks can be computed concurrently. CHESSFAD is implemented as a lightweight header-based C++ library that works both for CPUs and GPUs. We evaluate the performance of CHESSFAD for performing a large number of independent Hessian-Vector products on a set of standard test functions and compare its performance to other existing header-based C++ libraries such as {\tt autodiff}. Our results show that CHESSFAD performs better than {\tt autodiff}, on all these functions with improvement ranging from 5-50\% on average.

Paper Structure

This paper contains 13 sections, 1 equation, 13 figures, 3 tables, 10 algorithms.

Figures (13)

  • Figure 1: Templated C++ Class hDual with overload operators + and *.
  • Figure 2: CUDA code segment for $L2$ implementation.
  • Figure 3: Execution time trend for sequential implementations for Rosenbrock function.
  • Figure 4: Execution time trend for sequential implementations for Ackley function.
  • Figure 5: Execution time trend for sequential implementations for Fletcher-Powell function.
  • ...and 8 more figures