Table of Contents
Fetching ...

Global sensitivity analysis with limited data via sparsity-promoting D-MORPH regression: Application to char combustion

Dongjin Lee, Elle Lavichant, Boris Kramer

TL;DR

Global sensitivity analysis often suffers from high computational cost when models are expensive. This work introduces a sparsity-promoting D-MORPH regression to train a polynomial dimensional decomposition (PDD) surrogate under limited data, achieving accurate variance-based sensitivity estimates with as little as $15\%$ of the usual training samples. By combining an initial sparse Lasso solution with a D-MORPH refinement through a non-homogeneous ODE and an iterative update, the method yields robust PDD coefficients and improved accuracy over conventional regression in underdetermined settings. Demonstrations on the Ishigami–Homma and Oakley–O'Hagan functions, and a char combustion model, show substantial data-efficiency gains and improved sensitivity estimates, with the char combustion case requiring only $151$ samples. The approach thus enables efficient global sensitivity analysis for expensive, high-dimensional problems with limited data.

Abstract

In uncertainty quantification, variance-based global sensitivity analysis quantitatively determines the effect of each input random variable on the output by partitioning the total output variance into contributions from each input. However, computing conditional expectations can be prohibitively costly when working with expensive-to-evaluate models. Surrogate models can accelerate this, yet their accuracy depends on the quality and quantity of training data, which is expensive to generate (experimentally or computationally) for complex engineering systems. Thus, methods that work with limited data are desirable. We propose a diffeomorphic modulation under observable response preserving homotopy (D-MORPH) regression to train a polynomial dimensional decomposition surrogate of the output that minimizes the number of training data. The new method first computes a sparse Lasso solution and uses it to define the cost function. A subsequent D-MORPH regression minimizes the difference between the D-MORPH and Lasso solution. The resulting D-MORPH based surrogate is more robust to input variations and more accurate with limited training data. We illustrate the accuracy and computational efficiency of the new surrogate for global sensitivity analysis using mathematical functions and an expensive-to-simulate model of char combustion. The new method is highly efficient, requiring only 15% of the training data compared to conventional regression.

Global sensitivity analysis with limited data via sparsity-promoting D-MORPH regression: Application to char combustion

TL;DR

Global sensitivity analysis often suffers from high computational cost when models are expensive. This work introduces a sparsity-promoting D-MORPH regression to train a polynomial dimensional decomposition (PDD) surrogate under limited data, achieving accurate variance-based sensitivity estimates with as little as of the usual training samples. By combining an initial sparse Lasso solution with a D-MORPH refinement through a non-homogeneous ODE and an iterative update, the method yields robust PDD coefficients and improved accuracy over conventional regression in underdetermined settings. Demonstrations on the Ishigami–Homma and Oakley–O'Hagan functions, and a char combustion model, show substantial data-efficiency gains and improved sensitivity estimates, with the char combustion case requiring only samples. The approach thus enables efficient global sensitivity analysis for expensive, high-dimensional problems with limited data.

Abstract

In uncertainty quantification, variance-based global sensitivity analysis quantitatively determines the effect of each input random variable on the output by partitioning the total output variance into contributions from each input. However, computing conditional expectations can be prohibitively costly when working with expensive-to-evaluate models. Surrogate models can accelerate this, yet their accuracy depends on the quality and quantity of training data, which is expensive to generate (experimentally or computationally) for complex engineering systems. Thus, methods that work with limited data are desirable. We propose a diffeomorphic modulation under observable response preserving homotopy (D-MORPH) regression to train a polynomial dimensional decomposition surrogate of the output that minimizes the number of training data. The new method first computes a sparse Lasso solution and uses it to define the cost function. A subsequent D-MORPH regression minimizes the difference between the D-MORPH and Lasso solution. The resulting D-MORPH based surrogate is more robust to input variations and more accurate with limited training data. We illustrate the accuracy and computational efficiency of the new surrogate for global sensitivity analysis using mathematical functions and an expensive-to-simulate model of char combustion. The new method is highly efficient, requiring only 15% of the training data compared to conventional regression.
Paper Structure (39 sections, 45 equations, 11 figures, 3 tables)

This paper contains 39 sections, 45 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Flow chart for computing the PDD surrogate with the proposed D-MORPH regression for global sensitivity analysis.
  • Figure 2: (a) Convergence of PDD estimates computed by a Lasso-based D-MORPH regression for mean and standard deviation; (b) sensitivity indices $S_{\{1\}}, S_{\{2\}}, S_{\{3\}}$ as D-MORPH iteration increases from 0 to 300.
  • Figure 3: Boxplots of the standard deviation of the random output $y(\mathbf{X})$, estimated by the bivariate fifth-order PDD using the Lasso (LAS) and Lasso-based D-MOPRH regressions with weight values $\lambda=0.2$, $0.6$, $1.0$ (DM0.2, DM0.6, DM1.0). Two underdetermined systems are considered: $M=337$ and $M=788$, which correspond to 30% and 70% of the number ($L =1,126$) of expansion coefficients. Each regression is repeated 20 times. The exact solution is shown as a gray-dotted line.
  • Figure 4: Boxplots of first-order sensitivity indices $S_{\{i\}}$, $i=1,2,3,4,5,6$, of the random output $y(\mathbf{X})$, in (a), (b), (c), (d), (e), (f), respectively, estimated by the bivariate fifth-order PDD using the Lasso (LAS) and Lasso-based D-MOPRH regressions with weight values $\lambda=0.2$, $0.6$, $1.0$ (DM0.2, DM0.6, DM1.0). Two underdetermined systems are considered: $M=337$ and $M=788$, which correspond to 30% and 70% of the number ($L=1,126$) of expansion coefficients. Each regression is repeated 20 times. The exact solution is shown as a gray-dotted line.
  • Figure 5: Fluidized bed for char combustion: (a) the schematic diagram shows the geometry and the initial concentration of glass beads with a diameter of $X_3$ stacked at the boiler's bottom to a height of $X_1$. Char with a diameter of $X_4$ is fed in through the left side; (b) the cell model used to predict gas behavior consists of 2,520 cells; and (c) the parcel model used to predict solid behavior contains 32,945 parcels.
  • ...and 6 more figures