Table of Contents
Fetching ...

AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models

Yuanyuan Xu, Hanchen Wang, Wenjie Zhang, Lexing Xie, Yin Chen, Flora Salim, Ying Zhang, Justin Gooding, Toby Walsh

TL;DR

This survey consolidates progress in AI-empowered catalyst discovery by unifying homogeneous and heterogeneous catalysis under four methodological pillars: classical ML, generative and reinforcement learning, graph neural networks, and large language models. It assesses each approach's strengths, limitations, and applicable datasets (e.g., OC20/OC22), and discusses open resources, evaluation metrics, and open-source tools. The authors propose a holistic framework, identify key challenges, and outline future directions such as physics-informed ML, real-time autonomous discovery, and continuous data curation to accelerate catalyst design. Together, these insights map a road map for researchers across computational chemistry and computer science to advance efficient, scalable, and interpretable AI-driven catalyst discovery with practical industrial relevance.

Abstract

Catalysts are essential for accelerating chemical reactions and enhancing selectivity, which is crucial for the sustainable production of energy, materials, and bioactive compounds. Catalyst discovery is fundamental yet challenging in computational chemistry and has garnered significant attention due to the promising performance of advanced Artificial Intelligence (AI) techniques. The development of Large Language Models (LLMs) notably accelerates progress in the discovery of both homogeneous and heterogeneous catalysts, where their chemical reactions differ significantly in material phases, temperature, dynamics, etc. However, there is currently no comprehensive survey that discusses the progress and latest developments in both areas, particularly with the application of LLM techniques. To address this gap, this paper presents a thorough and systematic survey of AI-empowered catalyst discovery, employing a unified and general categorization for homogeneous and heterogeneous catalysts. We examine the progress of AI-empowered catalyst discovery, highlighting their individual advantages and disadvantages, and discuss the challenges faced in this field. Furthermore, we suggest potential directions for future research from the perspective of computer science. Our goal is to assist researchers in computational chemistry, computer science, and related fields in easily tracking the latest advancements, providing a clear overview and roadmap of this area. We also organize and make accessible relevant resources, including article lists and datasets, in an open repository at https://github.com/LuckyGirl-XU/Awesome-Artificial-Intelligence-Empowered-Catalyst-Discovery.

AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models

TL;DR

This survey consolidates progress in AI-empowered catalyst discovery by unifying homogeneous and heterogeneous catalysis under four methodological pillars: classical ML, generative and reinforcement learning, graph neural networks, and large language models. It assesses each approach's strengths, limitations, and applicable datasets (e.g., OC20/OC22), and discusses open resources, evaluation metrics, and open-source tools. The authors propose a holistic framework, identify key challenges, and outline future directions such as physics-informed ML, real-time autonomous discovery, and continuous data curation to accelerate catalyst design. Together, these insights map a road map for researchers across computational chemistry and computer science to advance efficient, scalable, and interpretable AI-driven catalyst discovery with practical industrial relevance.

Abstract

Catalysts are essential for accelerating chemical reactions and enhancing selectivity, which is crucial for the sustainable production of energy, materials, and bioactive compounds. Catalyst discovery is fundamental yet challenging in computational chemistry and has garnered significant attention due to the promising performance of advanced Artificial Intelligence (AI) techniques. The development of Large Language Models (LLMs) notably accelerates progress in the discovery of both homogeneous and heterogeneous catalysts, where their chemical reactions differ significantly in material phases, temperature, dynamics, etc. However, there is currently no comprehensive survey that discusses the progress and latest developments in both areas, particularly with the application of LLM techniques. To address this gap, this paper presents a thorough and systematic survey of AI-empowered catalyst discovery, employing a unified and general categorization for homogeneous and heterogeneous catalysts. We examine the progress of AI-empowered catalyst discovery, highlighting their individual advantages and disadvantages, and discuss the challenges faced in this field. Furthermore, we suggest potential directions for future research from the perspective of computer science. Our goal is to assist researchers in computational chemistry, computer science, and related fields in easily tracking the latest advancements, providing a clear overview and roadmap of this area. We also organize and make accessible relevant resources, including article lists and datasets, in an open repository at https://github.com/LuckyGirl-XU/Awesome-Artificial-Intelligence-Empowered-Catalyst-Discovery.

Paper Structure

This paper contains 54 sections, 24 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparison between traditional catalyst discovery (left) and AI-empowered catalyst discovery (right).
  • Figure 2: Key Features and Algorithms in Catalyst Design.
  • Figure 3: Holistic categorisation of representative works in AI-empowered approaches for catalyst discovery.
  • Figure 4: Overview of classical methods in catalyst discovery. Regression models are collected from weng2020simplekovavcevic2021constructionmishra2023predicting.
  • Figure 5: Overview of generative and reinforcement learning in catalyst discovery lacombe2023adsorbrl.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1: Graph.