Transfer learning from first-principles calculations to experiments with chemistry-informed domain transformation
Yuta Yahagi, Kiichi Obuchi, Fumihiko Kosaka, Kota Matsui
TL;DR
This work tackles the bottleneck of scarce experimental data in materials science by introducing a chemistry-informed domain transformation to bridge first-principles simulations and experiments. The method first maps computational data into the experimental domain using ensemble averaging and a physics-informed conversion function, then applies standard homogeneous domain adaptation to build predictive models with high data efficiency. A RWGS catalyst activity case demonstrates positive transfer: pretraining on abundant DFT data and a small amount of experimental data yields far lower test errors than training from scratch, sometimes by an order of magnitude, while using fewer target data. The approach highlights a practical route to accelerate catalyst discovery by integrating theory, computation, and data, potentially reducing the number of laboratory experiments required.
Abstract
Simulation-to-Real (Sim2Real) transfer learning, the machine learning technique that efficiently solves a real-world task by leveraging knowledge from computational data, has received increasing attention in materials science as a promising solution to the scarcity of experimental data. We proposed an efficient transfer learning scheme from first-principles calculations to experiments based on the chemistry-informed domain transformation, that integrates the heterogeneous source and target domains by harnessing the underlying physics and chemistry. The proposed method maps the computational data from the simulation space (source domain) into the space of experimental data (target domain). During this process, these qualitatively different domains are efficiently integrated by a couple of prior knowledge of chemistry, (1) the statistical ensemble, and (2) the relationship between source and target quantities. As a proof-of-concept, we predict the catalyst activity for the reverse water-gas shift reaction by using the abundant first-principles data in addition to the experimental data. Through the demonstration, we confirmed that the transfer learning model exhibits positive transfer in accuracy and data efficiency. In particular, a significantly high accuracy was achieved despite using a few (less than ten) target data in domain transformation, whose accuracy is one order of magnitude smaller than that of a full scratch model trained with over 100 target data. This result indicates that the proposed method leverages the high prediction performance with few target data, which helps to save the number of trials in real laboratories.
