Table of Contents
Fetching ...

Lifelong Machine Learning Potentials for Chemical Reaction Network Explorations

Marco Eckhoff, Markus Reiher

TL;DR

This work addresses the high computational cost of exploring chemical reaction networks by introducing lifelong machine learning potentials (lMLPs) that continually adapt to new data via lifelong adaptive data selection (lADS). Leveraging universal eeACSF descriptors, ensemble uncertainty, and a Δ-learning approach with PBE-GFN2, the authors demonstrate that lMLPs can achieve chemical accuracy in rolling CRN explorations while dramatically reducing retraining costs. Compared with conventional iterative learning, continual learning with lADS preserves prior knowledge and efficiently integrates new data, yielding substantial improvements in energy and force prediction accuracy and enabling on-the-fly CRN exploration. The results suggest a practical pathway toward reliable, uncertainty-aware CRN predictions and adaptive refinement using limited, targeted high-level calculations.

Abstract

Recent developments in computational chemistry facilitate the automated quantum chemical exploration of chemical reaction networks for the in-silico prediction of synthesis pathways, yield, and selectivity. However, the underlying quantum chemical energy calculations require vast computational resources, limiting these explorations severely in practice. Machine learning potentials (MLPs) offer a solution to increase computational efficiency, while retaining the accuracy of reliable first-principles data used for their training. Unfortunately, MLPs will be limited in their generalization ability within chemical (reaction) space, if the underlying training data are not representative for a given application. Within the framework of automated reaction network exploration, where new reactants or reagents composed of any elements from the periodic table can be introduced, this lack of generalizability will be the rule rather than the exception. Here, we therefore evaluate the benefits of the lifelong MLP concept in this context. Lifelong MLPs push their adaptability by efficient continual learning of additional data. We propose an improved learning algorithm for lifelong adaptive data selection yielding efficient integration of new data while previous expertise is preserved. In this way, we can reach chemical accuracy in reaction search trials.

Lifelong Machine Learning Potentials for Chemical Reaction Network Explorations

TL;DR

This work addresses the high computational cost of exploring chemical reaction networks by introducing lifelong machine learning potentials (lMLPs) that continually adapt to new data via lifelong adaptive data selection (lADS). Leveraging universal eeACSF descriptors, ensemble uncertainty, and a Δ-learning approach with PBE-GFN2, the authors demonstrate that lMLPs can achieve chemical accuracy in rolling CRN explorations while dramatically reducing retraining costs. Compared with conventional iterative learning, continual learning with lADS preserves prior knowledge and efficiently integrates new data, yielding substantial improvements in energy and force prediction accuracy and enabling on-the-fly CRN exploration. The results suggest a practical pathway toward reliable, uncertainty-aware CRN predictions and adaptive refinement using limited, targeted high-level calculations.

Abstract

Recent developments in computational chemistry facilitate the automated quantum chemical exploration of chemical reaction networks for the in-silico prediction of synthesis pathways, yield, and selectivity. However, the underlying quantum chemical energy calculations require vast computational resources, limiting these explorations severely in practice. Machine learning potentials (MLPs) offer a solution to increase computational efficiency, while retaining the accuracy of reliable first-principles data used for their training. Unfortunately, MLPs will be limited in their generalization ability within chemical (reaction) space, if the underlying training data are not representative for a given application. Within the framework of automated reaction network exploration, where new reactants or reagents composed of any elements from the periodic table can be introduced, this lack of generalizability will be the rule rather than the exception. Here, we therefore evaluate the benefits of the lifelong MLP concept in this context. Lifelong MLPs push their adaptability by efficient continual learning of additional data. We propose an improved learning algorithm for lifelong adaptive data selection yielding efficient integration of new data while previous expertise is preserved. In this way, we can reach chemical accuracy in reaction search trials.

Paper Structure

This paper contains 14 sections, 3 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: (a) In iterative learning, a new machine learning model is trained from scratch on all data to integrate additional training data. (b) In lifelong learning, the training of a model is continued employing added data and only a subset of previous data to prevent forgetting. In this way, previously acquired knowledge of the model is exploited. We note that lifelong learning is here only based on rehearsal of data, while it can also exploit model parameter regularization and the model architecture.
  • Figure 2: Key conceptual elements of lMLPs denoted in black font. Example methods are denoted in gray (see main text for details). Reference data need to be represented by a universal descriptor to be learned by the lMLP. If simulations encounter lMLP predictions of high uncertainty, active learning tackles this issue in the framework of a continual learning approach for an efficient adaption of the lMLP. Moreover, results obtained in such simulations may point to follow-up simulations, e.g., on related systems, which may then require training on additional data coming in. Data points integrated by continual learning neither require training from scratch, nor on all data.
  • Figure 3: Simplified scheme of the core concept behind lADS to continuously reduce and clean the data during training. (I) Data is redundant if it is seldom trained but still well represented. (II) Data is likely to be inconsistent if it is very often trained but still badly represented.
  • Figure 4: Test RMSEs of (a) energies $E^\mathrm{test}$ and (b) atomic force components $F_{\alpha,n}^\mathrm{test}$ after training $1\,000$ epochs on the (extended) data. Trainings were carried out on a sequence of $1$, $2$, $4$, $8$, and $16$ data sets. The total number of epochs $N_\mathrm{epochs}$ is higher for training in more sets, resembling the exploitation of previously acquired knowledge. A constant fraction of $\tfrac{1}{30}$ of all training structures in the respective data set was utilized per epoch. In this way, we can compare results obtained with the same number of structure evaluations, whereby we do not count the evaluations required for training the previous MLP. To avoid instabilities in lADS due to the missing adaption of the number of fitted structures per epoch, we applied here $T_1=0.75$, $N_{-}=40$, $N_{++}=400$, and $p_\mathrm{redun}^\mathrm{max}=0.015$. In this figure and in Figures \ref{['fig:lMLP_constant']} and \ref{['fig:lMLP_random']}, $n_\mathrm{epoch}\,N_\mathrm{epochs}^{-1}$ represents a relative scale for the learning curves on the test data. In this figure, the number of underlying training data coincides for the graphs when dots are plotted at the given value of $n_\mathrm{epoch}\,N_\mathrm{epochs}^{-1}$ for these graphs. These dots represent RMSEs of individual HDNNP ensemble members, lines show their mean, and shaded areas span their range. The black dashed line represents the mean RMSE of training from scratch.
  • Figure 5: Test RMSEs of (a) energies $E^\mathrm{test}$ and (b) atomic force components $F_{\alpha,n}^\mathrm{test}$ for training the data in $1$, $2$, $4$, $8$, and $16$ sets. All these trainings performed the same number of structure evaluations to obtain a fair comparison. Out of this reason, we replaced the adaption of the number of fitted structures per epoch in lADS by a constant number of $N_\mathrm{fit}=750$ structures and trained in total for $N_\mathrm{epochs}=16\,000$ epochs. We readjusted $p_\mathrm{redun}^\mathrm{max}$ to $0.0125$ due to the change in $N_\mathrm{fit}$. Spikes in the graphs originate from data additions. The dots represent RMSEs of individual HDNNP ensemble members and lines show their mean.
  • ...and 4 more figures