Table of Contents
Fetching ...

PETScML: Second-order solvers for training regression problems in Scientific Machine Learning

Stefano Zampini, Umberto Zerbinati, George Turkiyyah, David Keyes

TL;DR

This work empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases.

Abstract

In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods. However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization. We introduce a lightweight software framework built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional solvers for unconstrained minimization. We empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional second-order solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models.

PETScML: Second-order solvers for training regression problems in Scientific Machine Learning

TL;DR

This work empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases.

Abstract

In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods. However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization. We introduce a lightweight software framework built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional solvers for unconstrained minimization. We empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional second-order solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models.
Paper Structure (15 sections, 21 equations, 8 figures, 1 table)

This paper contains 15 sections, 21 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: FNO Burgers' test case. Convergence histories for loss function values and testing metrics in terms of epochs (top row, panels A and B) and oracle calls (bottom row, panels C and D) for different solvers. The values in the legends denote the minimum metric value achieved. See Section \ref{['sec:results_fno_burgers']} for details.
  • Figure 2: FNO Burgers' test case with a hybrid solver. Convergence histories for testing metrics starting from checkpointed solutions of the reference solver. The values in the legends denote the checkpointed epoch. See Section \ref{['sec:results_fno_burgers']} for details.
  • Figure 3: FNO Navier-Stokes test case. Convergence histories for loss function values and testing metrics in terms of epochs (top row, panels A and B) and oracle calls (bottom row, panels C and D) for different solvers. The values in the legends denote the minimum metric value achieved. See Section \ref{['sec:results_fno_ns']} for details.
  • Figure 4: DeepONet reaction-diffusion test case. Convergence histories for loss function values and testing metrics in terms of epochs (top row, panels A and B) and oracle calls (bottom row, panels C and D) for different solvers. The values in the legends denote the minimum metric value achieved. See Section \ref{['sec:results_deeponet_rd']} for details.
  • Figure 5: DeepONet reaction-diffusion oracle calls breakdown. Test metric (black, left-most y-axis), objective function evaluations (shaded blue, right-most y-axis), and number of Hessian matrix-vector products (shaded red, right-most y-axis) against epoch number for different solvers (shown on top). See Section \ref{['sec:results_deeponet_rd']} for details.
  • ...and 3 more figures