Table of Contents
Fetching ...

An active learning approach for improving the performance of equilibrium based chemical simulations

Mary Savino, Céline Lévy-Leduc, Marc Leconte, Benoit Cochepin

TL;DR

The paper tackles the high computational cost of equilibrium-based chemical simulations by introducing a sequential active-learning framework that treats the target function as a sample from a Gaussian Process. By iteratively selecting evaluation points with maximal predictive uncertainty and learning anisotropic length scales via maximum likelihood, the method builds accurate surrogates with far fewer function evaluations than dense sampling. It compares Squared Exponential and Matérn covariances, demonstrates strong performance in 1D, 2D, and 6D geochemical settings (including calcite and dolomite precipitation) and analyzes stopping criteria to balance accuracy and efficiency. The approach offers a practical, low-tuning, scalable alternative for complex reactive-transport problems and other geochemical simulations, with potential extension to more intricate systems.

Abstract

In this paper, we propose a novel sequential data-driven method for dealing with equilibrium based chemical simulations, which can be seen as a specific machine learning approach called active learning. The underlying idea of our approach is to consider the function to estimate as a sample of a Gaussian process which allows us to compute the global uncertainty on the function estimation. Thanks to this estimation and with almost no parameter to tune, the proposed method sequentially chooses the most relevant input data at which the function to estimate has to be evaluated to build a surrogate model. Hence, the number of evaluations of the function to estimate is dramatically limited. Our active learning method is validated through numerical experiments and applied to a complex chemical system commonly used in geoscience.

An active learning approach for improving the performance of equilibrium based chemical simulations

TL;DR

The paper tackles the high computational cost of equilibrium-based chemical simulations by introducing a sequential active-learning framework that treats the target function as a sample from a Gaussian Process. By iteratively selecting evaluation points with maximal predictive uncertainty and learning anisotropic length scales via maximum likelihood, the method builds accurate surrogates with far fewer function evaluations than dense sampling. It compares Squared Exponential and Matérn covariances, demonstrates strong performance in 1D, 2D, and 6D geochemical settings (including calcite and dolomite precipitation) and analyzes stopping criteria to balance accuracy and efficiency. The approach offers a practical, low-tuning, scalable alternative for complex reactive-transport problems and other geochemical simulations, with potential extension to more intricate systems.

Abstract

In this paper, we propose a novel sequential data-driven method for dealing with equilibrium based chemical simulations, which can be seen as a specific machine learning approach called active learning. The underlying idea of our approach is to consider the function to estimate as a sample of a Gaussian process which allows us to compute the global uncertainty on the function estimation. Thanks to this estimation and with almost no parameter to tune, the proposed method sequentially chooses the most relevant input data at which the function to estimate has to be evaluated to build a surrogate model. Hence, the number of evaluations of the function to estimate is dramatically limited. Our active learning method is validated through numerical experiments and applied to a complex chemical system commonly used in geoscience.

Paper Structure

This paper contains 13 sections, 26 equations, 14 figures, 1 algorithm.

Figures (14)

  • Figure 1: Functions $f$ to estimate when $d=1$ (left) and $d=2$ (right).
  • Figure 2: Illustration of our active learning approach for estimating the function displayed in the left part of Figure \ref{['fig:courbes']} by starting from $t_1=3$ observations randomly chosen in $\textrm{A}$ with the squared exponential covariance function.
  • Figure 3: Average and standard deviation of different statistical measures for the squared exponential covariance function defined in (\ref{['eq:cov_gauss']}) (left) and for the Matern covariance function defined in (\ref{['eq:cov_maternp']}) (right) in the case $d=1$.
  • Figure 4: Left: Statistical assessment of the error estimation of $f$ displayed in the left part of Figure \ref{['fig:courbes']} for the stopping criteria defined in (\ref{['eq:Rk_seuil']}), (\ref{['eq:Ml_seuil']}) and (\ref{['eq:V_seuil']}) for the squared exponential and the Matérn covariance functions. Top right: Number of evaluations required for the considered stopping criteria. Bottom right: Values of $V(t^\star)$ where $V$ is defined in (\ref{['eq:V']}) and $t^\star$ is the stopping iteration which changes from one stopping criterion to another.
  • Figure 5: Illustration of our active learning approach for estimating the function displayed in the right part of Figure \ref{['fig:courbes']} by starting from $t_1=3$ observations randomly chosen in $\textrm{A}\subset [0,1]^2$ for the squared exponential covariance function.
  • ...and 9 more figures