Table of Contents
Fetching ...

Optimizing Cross-Domain Transfer for Universal Machine Learning Interatomic Potentials

Jaesun Kim, Jinmu You, Yutack Park, Yunsung Lim, Yujin Kang, Jisu Kim, Haekwan Jeon, Suyeon Ju, Deokgi Hong, Seung Yul Lee, Saerom Choi, Yongdeok Kim, Jae W. Lee, Seungwu Han

TL;DR

SevenNet-Omni addresses the core transferability challenge of universal MLIPs by training across 15 heterogeneous databases with a two-component parameterization: a shared PES represented by $\theta_C$ and task-specific corrections $\theta_T$. The framework employs selective regularization on $\theta_T$ together with a domain-bridging set (DBS) of cross-domain evaluations to align energy surfaces and enhance cross-domain generalization. Across diverse benchmarks—from torsion barriers to adsorption on metal surfaces and MOFs—the method achieves state-of-the-art cross-domain accuracy, including sub-0.1 eV adsorption energies and the ability to reproduce $r^2$SCAN energetics despite limited $r^2$SCAN data. The curriculum-driven training and energy-shift alignment enable effective cross-functional transfer from large PBE datasets to hybrid-functionals, offering a scalable path toward universal, transferable MLIPs that bridge quantum-mechanical fidelity and chemical diversity.

Abstract

Accurate yet transferable machine-learning interatomic potentials (MLIPs) are essential for accelerating materials and chemical discovery. However, most universal MLIPs overfit to narrow datasets or computational protocols, limiting their reliability across chemical and functional domains. We introduce a transferable multi-domain training strategy that jointly optimizes universal and task-specific parameters through selective regularization, coupled with a domain-bridging set (DBS) that aligns potential-energy surfaces across datasets. Systematic ablation experiments show that small DBS fractions (0.1%) and targeted regularization synergistically enhance out-of-distribution generalization while preserving in-domain fidelity. Trained on fifteen open databases spanning molecules, crystals, and surfaces, our model, SevenNet-Omni, achieves state-of-the-art cross-domain accuracy, including adsorption-energy errors below 0.06 eV on metallic surfaces and 0.1 eV on metal-organic frameworks. Despite containing only 0.5% r$^2$SCAN data, SevenNet-Omni reproduces high-fidelity r$^2$SCAN energetics, demonstrating effective cross-functional transfer from large PBE datasets. This framework offers a scalable route toward universal, transferable MLIPs that bridge quantum-mechanical fidelities and chemical domains.

Optimizing Cross-Domain Transfer for Universal Machine Learning Interatomic Potentials

TL;DR

SevenNet-Omni addresses the core transferability challenge of universal MLIPs by training across 15 heterogeneous databases with a two-component parameterization: a shared PES represented by and task-specific corrections . The framework employs selective regularization on together with a domain-bridging set (DBS) of cross-domain evaluations to align energy surfaces and enhance cross-domain generalization. Across diverse benchmarks—from torsion barriers to adsorption on metal surfaces and MOFs—the method achieves state-of-the-art cross-domain accuracy, including sub-0.1 eV adsorption energies and the ability to reproduce SCAN energetics despite limited SCAN data. The curriculum-driven training and energy-shift alignment enable effective cross-functional transfer from large PBE datasets to hybrid-functionals, offering a scalable path toward universal, transferable MLIPs that bridge quantum-mechanical fidelity and chemical diversity.

Abstract

Accurate yet transferable machine-learning interatomic potentials (MLIPs) are essential for accelerating materials and chemical discovery. However, most universal MLIPs overfit to narrow datasets or computational protocols, limiting their reliability across chemical and functional domains. We introduce a transferable multi-domain training strategy that jointly optimizes universal and task-specific parameters through selective regularization, coupled with a domain-bridging set (DBS) that aligns potential-energy surfaces across datasets. Systematic ablation experiments show that small DBS fractions (0.1%) and targeted regularization synergistically enhance out-of-distribution generalization while preserving in-domain fidelity. Trained on fifteen open databases spanning molecules, crystals, and surfaces, our model, SevenNet-Omni, achieves state-of-the-art cross-domain accuracy, including adsorption-energy errors below 0.06 eV on metallic surfaces and 0.1 eV on metal-organic frameworks. Despite containing only 0.5% rSCAN data, SevenNet-Omni reproduces high-fidelity rSCAN energetics, demonstrating effective cross-functional transfer from large PBE datasets. This framework offers a scalable route toward universal, transferable MLIPs that bridge quantum-mechanical fidelities and chemical domains.

Paper Structure

This paper contains 20 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Schematic overview of multi-domain training strategy.a Training sets used in 7net-Omni, categorized by material domain and level of quantum-mechanical theory. The brightness of each block reflects the proportion of high-energy, far-from-equilibrium structures, with darker shades indicating a higher fraction of such configurations. b Clustering of task-embedding vectors projected using principal component analysis. c Schematic illustration of potential energy curves obtained from multi-task training under different approaches. Solid lines represent the ground-truth PES from the corresponding ab initio methods, while markers indicate training points from databases $D_1$ and $D_2$. Dashed lines show PES predicted by multi-task MLIPs: the gray dashed line corresponds to a shared PES, whereas the blue dashed line illustrates a PES obtained using task-specific parameters trained on $D_1$. d Performance of multi-task MLIPs trained on multi-domain datasets. Blue, red, and purple bars denote force errors for the corresponding material domains and fidelity, while the yellow bar indicates the L2 norm of the task-specific parameters.
  • Figure 2: Performance of uMLIPs for cross-domain scenarios.a MAE in predicting torsional energy barriers. The $y$-axis lists the uMLIP models. For multi-task uMLIPs, the inference channel is indicated to the right of each bar; for single-task uMLIPs, the corresponding training set is shown. White bullets mark the accuracy of the hybrid-functional channel (parentheses) in reference to the $\omega$B97M-D3 results. b MAE of reaction energy predictions for organometallic complexes. c MAE of cohesive energy predictions for organic crystals. d MAE of formation energy predictions for hybrid organic-inorganic perovskites. e MAE of adsorption energy predictions for molecular inhibitors, computed by energy changes from physisorbed to chemisorbed states. f Normalized MAEs of four benchmark tasks for metal-organic frameworks. MAE values were normalized to the maximum MAE across all models. In a-- f, all reference DFT data are obtained with PBE-D3. Solid bars in a-- e represent the best-performing channel or training database. Individual parity plots are provided in Supplementary Figs. 7--9,12,13.
  • Figure 3: Molecular energy overestimation and PES stiffening.a Error distributions of molecular energies across uMLIP models, represented as violin and box plots. The inference tasks of multi-task models and the datasets used for single-task models are indicated at the top of the plot. Individual data points are randomly jittered along the horizontal axis for visual clarity. b PES for CO2 adsorption on Mg-MOF74 computed with DFT (PBE-D3) and uMLIPs. c Comparison of $E_{\rm ads}$ without deformation for CO2 and H2O between DFT and 7net models. Linear regression was performed for data points with $E_{\rm ads}^{\rm DFT}$ less than 0.5 eV. The line equations are shown in the figure insets.
  • Figure 4: Accuracy of uMLIPs for reactions on metal surfaces.a MAE for adsorption energies of *H, *O, *OH, and *CO adsorbates on five noble metals, Cu, Pd, Pt, Ag, and Au. The $y$-axis lists the uMLIP models. For multi-task uMLIPs, the inference channel is indicated to the right of each bar; for single-task uMLIPs, the corresponding training set is shown. White bullets mark the performance of the RPBE-fidelity channel (parentheses), compared with RPBE results. b MAE of physisorption energy predictions for ADS41 dataset. c MAE of chemisorption energy predictions for ADS41 dataset, excluding Co and Ni surfaces. d MAE of chemisorption energy predictions for adsorptions on Co and Ni surfaces in ADS41 dataset. e Potential energy curves of uMLIPs along the distance between oxygen atom and Co or Cu metal surface. Reference DFT data are calculated within PBE (a) or PBE-D3 (b--d). Solid bars in a-- d represent the best-performing channel or training database. Individual parity plots are presented in Supplementary Figs. 21,22.
  • Figure 5: Inference speed of uMLIPs. The inference performance of each uMLIP is evaluated using MDs simulations of diamond Si. The speed is reported in units of nanoseconds per day (ns/day), measured on a H100 GPU card.