Table of Contents
Fetching ...

Transfer learning for atomistic simulations using GNNs and kernel mean embeddings

John Falk, Luigi Bonati, Pietro Novelli, Michele Parrinello, Massimiliano Pontil

TL;DR

This work tackles data-efficient transfer learning for modeling the potential energy surface in atomistic systems by uniting pre-trained graph neural network (GNN) representations with kernel mean embeddings. The authors introduce MEKRR, which uses GNN features learned on the large OC20 dataset and learns a PES through kernel ridge regression while enforcing physical symmetries via mean embeddings; they further enrich the kernel with chemically informed, per-species terms. Across realistic catalytic datasets that include out-of-distribution configurations, MEKRR demonstrates superior transferability and accuracy compared to GNNs or ridge regression baselines, and it benefits from a flexible α parameter that blends global and species-specific interactions. The approach offers data-efficient, interpretable PES modeling with potential extensions to MD by incorporating forces and scaling techniques. Overall, MEKRR advances transfer learning in computational chemistry by leveraging foundation-model-like representations within a principled kernel framework to capture complex chemical environments.

Abstract

Interatomic potentials learned using machine learning methods have been successfully applied to atomistic simulations. However, accurate models require large training datasets, while generating reference calculations is computationally demanding. To bypass this difficulty, we propose a transfer learning algorithm that leverages the ability of graph neural networks (GNNs) to represent chemical environments together with kernel mean embeddings. We extract a feature map from GNNs pre-trained on the OC20 dataset and use it to learn the potential energy surface from system-specific datasets of catalytic processes. Our method is further enhanced by incorporating into the kernel the chemical species information, resulting in improved performance and interpretability. We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance, and improving on methods that rely on GNNs or ridge regression alone, as well as similar fine-tuning approaches.

Transfer learning for atomistic simulations using GNNs and kernel mean embeddings

TL;DR

This work tackles data-efficient transfer learning for modeling the potential energy surface in atomistic systems by uniting pre-trained graph neural network (GNN) representations with kernel mean embeddings. The authors introduce MEKRR, which uses GNN features learned on the large OC20 dataset and learns a PES through kernel ridge regression while enforcing physical symmetries via mean embeddings; they further enrich the kernel with chemically informed, per-species terms. Across realistic catalytic datasets that include out-of-distribution configurations, MEKRR demonstrates superior transferability and accuracy compared to GNNs or ridge regression baselines, and it benefits from a flexible α parameter that blends global and species-specific interactions. The approach offers data-efficient, interpretable PES modeling with potential extensions to MD by incorporating forces and scaling techniques. Overall, MEKRR advances transfer learning in computational chemistry by leveraging foundation-model-like representations within a principled kernel framework to capture complex chemical environments.

Abstract

Interatomic potentials learned using machine learning methods have been successfully applied to atomistic simulations. However, accurate models require large training datasets, while generating reference calculations is computationally demanding. To bypass this difficulty, we propose a transfer learning algorithm that leverages the ability of graph neural networks (GNNs) to represent chemical environments together with kernel mean embeddings. We extract a feature map from GNNs pre-trained on the OC20 dataset and use it to learn the potential energy surface from system-specific datasets of catalytic processes. Our method is further enhanced by incorporating into the kernel the chemical species information, resulting in improved performance and interpretability. We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance, and improving on methods that rely on GNNs or ridge regression alone, as well as similar fine-tuning approaches.
Paper Structure (33 sections, 13 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 13 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Diagram of MEKRR.
  • Figure 2: Validation error ( RMSE / MAE) of MEKRR-(SchNet) on the Cu/formate and Fe/$\mathrm{N}_2$ ($D_2$) datasets as a function of $\alpha$ geometrically spaced on a grid from $0$ to $1$ with optimal $\alpha$ and error given by a bold orange point. The optimal $\alpha$ for the Cu/formate dataset is positive but close to zero while the optimal $\alpha$ for the Fe/$\mathrm{N}_2$ is found at the boundary at $1.0$ leading to a hard multi-embedding kernel. We see that tuning the $\alpha$ allows for improved performance in practice and that the multi-weight formulation \ref{['eq:mw-kernel']} is practically beneficial.
  • Figure 3: Heatmaps of the $K_{\alpha}$-SchNet kernel applied to a part of the trajectory of $D_2$ (where a reactive event occurs) and time series of the distance between nitrogen atoms over time $t$. The cases with $\alpha=0$ and $\alpha=1$ are reported on the left and right, respectively. Using spectral clustering with the two kernels as inputs we label each time-index with one of two classes, with the background color showing the class. Spectral clustering with the multi-weight kernel picks out the reactive event perfectly.
  • Figure 4: Heatmaps of the $K_{\alpha}$ kernel applied when $k$ is Gaussian (lengthscale fit with median heuristic) and the node-features are given by the $L$'th layer $L=1, \dots 5$ of the pretrained SchNet GNN going from the leftmost to rightmost column, to a part of the trajectory of $D_2$ (where a reactive event occurs) and time series of the distance between nitrogen atoms over time $t$. The cases with $\alpha=0$ and $\alpha=1$ are reported above and below, respectively. Using spectral clustering with the two kernels as inputs we label each time-index with one of two classes, with the background color showing the class. The color has been normalized to be between 0 and 1 which does not affect the clustering or visualization. We can see that output representation from later layers yield more patterned kernel matrices with more erratic clustering. Using the multi-weight kernel $K_{\alpha}$ where $\alpha = 1$ gives better results across the board.