AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

Janice Lan; Aini Palizhati; Muhammed Shuaibi; Brandon M. Wood; Brook Wander; Abhishek Das; Matt Uyttendaele; C. Lawrence Zitnick; Zachary W. Ulissi

AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

Janice Lan, Aini Palizhati, Muhammed Shuaibi, Brandon M. Wood, Brook Wander, Abhishek Das, Matt Uyttendaele, C. Lawrence Zitnick, Zachary W. Ulissi

TL;DR

AdsorbML tackles the computational bottleneck of locating global minimum adsorption energies by combining generalizable ML potentials with an AdsorbML search strategy that ranks and refines multiple initial configurations. The Open Catalyst OC20-Dense dataset is introduced as a standardized benchmark, enabling rigorous evaluation across diverse adsorbates and surfaces. The results demonstrate substantial speedups (up to ~2,300×) with competitive success rates (~87%), revealing a practical path for high-throughput catalyst screening and highlighting opportunities for further optimization and generalization. Collectively, the work provides a scalable framework for accurate, efficient adsorption-energy estimation and points to future directions in global optimization and broader chemical-space applicability.

Abstract

Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration 87.36% of the time, while achieving a 2000x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 100,000 unique configurations.

AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

TL;DR

Abstract

Paper Structure (34 sections, 4 equations, 7 figures, 16 tables)

This paper contains 34 sections, 4 equations, 7 figures, 16 tables.

Introduction
Related Work
Results
OC20-Dense Evaluation
Relaxations
AdsorbML Algorithm
Experiments
Discussion
Methods
OC20D
Evaluation Metrics
Relaxation Constraints
Data Availability
Code Availability
Author Contributions
...and 19 more sections

Figures (7)

Figure 1: An overview of the steps involved in identifying the adsorption energy for an adsorbate-surface combination. First, an adsorbate and surface combination are selected, then numerous configurations are enumerated heuristically and/or randomly. For each configuration, relaxations are performed and systems are filtered based on physical constraints that ensure valid adsorption energies (i.e. desorption, dissociation, surface mismatch). The minimum energy across all configurations is identified as the adsorption energy.
Figure 1: Results for SCN-MD-Large, single-points (top) and relaxations (bottom) at $k=5$. Left: distribution of differences between predicted and ground truth adsorption energies. Lower is better, meaning that AdsorbML found a better binding site. Differences within 0.1 eV are also considered comparable and a success, represented in teal. Red bars are failure cases. Right: an aggregation of the major categories of energy differences. Results reported on the validation set.
Figure 2: The AdsorbML algorithm. Initial configurations are generated via heuristic and random strategies. relaxations are performed on GPUs and ranked in order of lowest to highest energy. The best $k$ systems are passed on to for either a single-point (SP) evaluation or a full relaxation (RX) from the relaxed structure. Systems not satisfying constraints are filtered at each stage a relaxation is performed. The minimum is taken across all outputs for the final adsorption energy.
Figure 2: Overview of the accuracy-efficiency trade-offs of the proposed AdsorbML methods across several baseline models on the validation set. For each model, speedup and corresponding success rate are plotted for ML+RX and ML+SP across various best-$k$. A system is considered successful if the predicted adsorption energy is within 0.1 eV of the minimum, or lower. All success rates and speedups are relative to Random+Heuristic . Heuristic is shown as a common community baseline. The upper right-hand corner represent the optimal region - maximizing speedup and success rate. The point outlined in pink corresponds to the balanced option - a 86.33% success rate and 1331x speedup.
Figure 3: Overview of the accuracy-efficiency trade-offs of the proposed AdsorbML methods across several baseline models. For each model, speedup and corresponding success rate are plotted for ML+RX and ML+SP across various best-$k$. A system is considered successful if the predicted adsorption energy is within 0.1 eV of the minimum, or lower. All success rates and speedups are relative to Random+Heuristic . Heuristic is shown as a common community baseline. The upper right-hand corner represent the optimal region - maximizing speedup and success rate. The point highlighted in teal corresponds to the balanced option reported in the abstract - a 87.36% success rate and 2290x speedup. A similar figure for the validation set can be found in the .
...and 2 more figures

AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

TL;DR

Abstract

AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

Authors

TL;DR

Abstract

Table of Contents

Figures (7)