Table of Contents
Fetching ...

Quadrature Sampling of Parametric Models with Bi-fidelity Boosting

Nuojin Cheng, Osman Asif Malik, Yiming Xu, Stephen Becker, Alireza Doostan, Akil Narayan

TL;DR

This work tackles the costly task of building emulators for parametric maps in forward UQ by combining quadrature-based LS with a novel bi-fidelity boosting (BFB) framework. BFB uses a cheap low-fidelity model to identify an effective sketch and then applies that sketch to the expensive high-fidelity data, achieving a residual close to the ideal boosted solution with substantially fewer high-fidelity evaluations. The paper provides pre-asymptotic and asymptotic analyses, including optimality bounds and Gaussian-sketch correlation results, and validates the approach on synthetic and PDE datasets, showing meaningful reductions in data requirements and improved regression accuracy. The results offer practical guidance on when boosting helps, governed by the correlation between low- and high-fidelity data, and establish connections to leverage-score and volume-based sketching techniques for efficient surrogate construction in PDE UQ workflows.

Abstract

Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise to reduce the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bi-fidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the non-boosted solution.

Quadrature Sampling of Parametric Models with Bi-fidelity Boosting

TL;DR

This work tackles the costly task of building emulators for parametric maps in forward UQ by combining quadrature-based LS with a novel bi-fidelity boosting (BFB) framework. BFB uses a cheap low-fidelity model to identify an effective sketch and then applies that sketch to the expensive high-fidelity data, achieving a residual close to the ideal boosted solution with substantially fewer high-fidelity evaluations. The paper provides pre-asymptotic and asymptotic analyses, including optimality bounds and Gaussian-sketch correlation results, and validates the approach on synthetic and PDE datasets, showing meaningful reductions in data requirements and improved regression accuracy. The results offer practical guidance on when boosting helps, governed by the correlation between low- and high-fidelity data, and establish connections to leverage-score and volume-based sketching techniques for efficient surrogate construction in PDE UQ workflows.

Abstract

Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise to reduce the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bi-fidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the non-boosted solution.
Paper Structure (26 sections, 9 theorems, 109 equations, 9 figures, 4 tables, 3 algorithms)

This paper contains 26 sections, 9 theorems, 109 equations, 9 figures, 4 tables, 3 algorithms.

Key Result

Theorem 3.2

Fix a positive integer $L$ and suppose $\delta, \varepsilon \in (0,1]$. If $\{\bm S_\ell\}_{\ell\in [L]}$ is a sequence of i.i.d. random matrices whose distribution is an $({\varepsilon}, \frac{\delta}{L})$ pair for $(\bm Q, \bm h)$, where then with probability at least $1-\delta$, where $\nu$ denotes the absolute correlation coefficient between $\bm P_{\bm Q_\perp}\bm b$ and $\bm P_{\bm Q_\perp

Figures (9)

  • Figure 1: Scatter plots of $\mu(\bm b, \bm S_{\ell^*}) - \mu(\bm b, \bm S_{\ell^{**}})$ based on given values of $\nu$ for Gaussian sketch (red) and leverage score sketch (blue). The green curve is the bound we provide in Theorem \ref{['thm:optimality']} with ${\varepsilon}=0.01$.
  • Figure 2: Scatter plots of the square of the optimality coefficient for high- and low-fidelity data for each of $100$ different sketches. Each point is equal to $(\mu^2(\tilde{\bm b}, \bm S), \mu^2(\bm b, \bm S))$ for one realization of the sketch $\boldsymbol{S}$. The top and bottom panels correspond to the sketches constructed using Gaussian and leverage score sampling sketches, respectively.
  • Figure 3: A figure of the temperature driven cavity flow problem, reproduced from Figure 5 of fdki17.
  • Figure 4: Scatter plots of the square of the optimality coefficient for high- and low-fidelity data from the cavity fluid flow problem for different polynomial spaces (top: total degree; bottom: hyperbolic cross) and types of sampling. Each point is equal to $(\mu^2(\tilde{\bm b}, \bm S), \mu^2(\bm b, \bm S))$ for one realization of the sketch $\boldsymbol{S}$, and each subplot contains 100 points (i.e., is based on 100 sketch realizations). For the total degree space $m=30$ samples are used and for the hyperbolic cross space $m=20$ samples are used. The corresponding correlation coefficients are presented in Table \ref{['table:cavity-flow']}.
  • Figure 5: Relative error for different sampling methods and polynomial spaces when fitting the surrogate model to the cavity fluid flow data. Yellow lines show the relative error $E$ in (\ref{['eqn:rel_err']}) for the unsketched solution in \ref{['myleastsquares']}. Blue lines show $E$ when the coefficients $\bm x$ are computed via the QR decomposition-based method in Section \ref{['sssec:qr_sampling']}. The blue box plots shows the distribution of $E$ based on 1000 trials when $\bm x$ is computed as in \ref{['eq:xast-def']}. The orange box plots shows the same things, but for the solution $\hat{\boldsymbol{x}}_{{\mathsf {BFB}}}$ computed via Algorithm \ref{['alg:BFB']}.
  • ...and 4 more figures

Theorems & Definitions (21)

  • Definition 2.1: $({\varepsilon},\delta)$ pair condition
  • Theorem 3.2
  • Remark 3.3
  • Theorem 3.4
  • proof : Proof of Theorem \ref{['thm:BFQS-error']}
  • Theorem 3.5
  • Remark 3.6
  • Lemma 3.7
  • proof
  • Proposition 3.8
  • ...and 11 more