Table of Contents
Fetching ...

Adaptive Gradient Enhanced Gaussian Process Surrogates for Inverse Problems

Phillip Semler, Martin Weiser

TL;DR

This work tackles the computational burden of constructing accurate surrogates for inverse problems with expensive forward models by employing gradient-enhanced Gaussian process regression (GEGPR) in a fully adaptive, budget-aware design. It introduces a two-part framework: (i) a gradient-inclusive GP surrogate that models both forward values and derivatives, and (ii) a greedy, two-stage design strategy that first selects promising evaluation points and then optimizes forward-model tolerances under a fixed work budget. The authors derive an accuracy model linking surrogate error to parameter reconstruction error and couple it with a power-law work model to guide adaptive data generation. Numerical experiments on an analytical 2D problem and a PDE-based scatterometry test demonstrate that including gradient information yields substantial efficiency gains (often two orders of magnitude) over value-only designs and over fixed-position approaches, with reconstruction achieving prescribed tolerances. The methodology enables more efficient offline data generation for surrogate-based inverse problem solvers, with potential impact on real-time parameter identification in computationally intensive applications.

Abstract

Generating simulated training data needed for constructing sufficiently accurate surrogate models to be used for efficient optimization or parameter identification can incur a huge computational effort in the offline phase. We consider a fully adaptive greedy approach to the computational design of experiments problem using gradient-enhanced Gaussian process regression as surrogates. Designs are incrementally defined by solving an optimization problem for accuracy given a certain computational budget. We address not only the choice of evaluation points but also of required simulation accuracy, both of values and gradients of the forward model. Numerical results show a significant reduction of the computational effort compared to just position-adaptive and static designs as well as a clear benefit of including gradient information into the surrogate training.

Adaptive Gradient Enhanced Gaussian Process Surrogates for Inverse Problems

TL;DR

This work tackles the computational burden of constructing accurate surrogates for inverse problems with expensive forward models by employing gradient-enhanced Gaussian process regression (GEGPR) in a fully adaptive, budget-aware design. It introduces a two-part framework: (i) a gradient-inclusive GP surrogate that models both forward values and derivatives, and (ii) a greedy, two-stage design strategy that first selects promising evaluation points and then optimizes forward-model tolerances under a fixed work budget. The authors derive an accuracy model linking surrogate error to parameter reconstruction error and couple it with a power-law work model to guide adaptive data generation. Numerical experiments on an analytical 2D problem and a PDE-based scatterometry test demonstrate that including gradient information yields substantial efficiency gains (often two orders of magnitude) over value-only designs and over fixed-position approaches, with reconstruction achieving prescribed tolerances. The methodology enables more efficient offline data generation for surrogate-based inverse problem solvers, with potential impact on real-time parameter identification in computationally intensive applications.

Abstract

Generating simulated training data needed for constructing sufficiently accurate surrogate models to be used for efficient optimization or parameter identification can incur a huge computational effort in the offline phase. We consider a fully adaptive greedy approach to the computational design of experiments problem using gradient-enhanced Gaussian process regression as surrogates. Designs are incrementally defined by solving an optimization problem for accuracy given a certain computational budget. We address not only the choice of evaluation points but also of required simulation accuracy, both of values and gradients of the forward model. Numerical results show a significant reduction of the computational effort compared to just position-adaptive and static designs as well as a clear benefit of including gradient information into the surrogate training.
Paper Structure (15 sections, 3 theorems, 19 equations, 7 figures, 4 tables)

This paper contains 15 sections, 3 theorems, 19 equations, 7 figures, 4 tables.

Key Result

Theorem 3.1

Assume that $y$ is twice continuously differentiable with uniformly bounded first and second derivatives in a neighborhood $B$ of a minimizer $p^*\in\mathcal{X}$ of $J(p;y^m)$ for some measurement data $y^m$, such that the residual $\left\| y(p^*)-y^m\right\|_{\Sigma_l^{-1}}$ is sufficiently small a

Figures (7)

  • Figure 1: Sketch of the design problem \ref{['eq:doe-incremental-reduced']} for $n=2$ points. Level lines of the objective $E(v)$ without gradient data are drawn by blue lines, whereas those of the budget constraint are indicated by dashed lines. The gradient-based version of $E(v)$ is drawn by solid green lines. Left: For $s>1$ there is a unique non-sparse solution. Middle: A smaller correlation length makes sparsity even less likely. Right: For $s<1$, the admissible sets are non-convex, and we may expect multiple local sparse minimizers.
  • Figure 2: Contour plot of the local error density. Red points show the initial design. Adaptively added data points are indicated by black dots. Small markers indicate low accuracy. Gradient information is indicated by red triangles. The color mapping shows the estimated local reconstruction error evaluated on a dense grid of $10^3$ points. The designs were obtained using an incremental budget of $\Delta W = 10^{4}$ with a desired tolerance of $\mathrm{TOL} = 0.01$.
  • Figure 3: Estimated global error $E(\mathcal{D})$ versus accumulated computational work in GEGPR surrogates. Left: $E(\mathcal{D})$ for different incremental work $\Delta W$. Solid lines with gradient data. Right: Different uniform tolerances in position-adaptive designs compared with different curves for $\Delta W_g = 100$.
  • Figure 4: Log-histogram of ${e}_i /\tilde{e}_i$ for value-based GPR (blue) and GEGPR (orange) surrogates.
  • Figure 5: Left: Scatterometry setup used for the characterization of periodic nanostructures on surfaces. The incident light is varied at angles $\theta$ and $\phi$ and the diffraction patterns are recorded. Right: Geometry parametrized in terms of radii $r_{\mathrm{top}}$ and $r_{\mathrm{bot}}$, height of the grating, side wall angle (swa), critical dimension (cd), and oxide layer thickness ($t$).
  • ...and 2 more figures

Theorems & Definitions (7)

  • Remark
  • Remark
  • Theorem 3.1
  • Theorem 3.2
  • proof
  • Remark
  • Corollary 3.2.1