AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models
Yuanyuan Xu, Hanchen Wang, Wenjie Zhang, Lexing Xie, Yin Chen, Flora Salim, Ying Zhang, Justin Gooding, Toby Walsh
TL;DR
This survey consolidates progress in AI-empowered catalyst discovery by unifying homogeneous and heterogeneous catalysis under four methodological pillars: classical ML, generative and reinforcement learning, graph neural networks, and large language models. It assesses each approach's strengths, limitations, and applicable datasets (e.g., OC20/OC22), and discusses open resources, evaluation metrics, and open-source tools. The authors propose a holistic framework, identify key challenges, and outline future directions such as physics-informed ML, real-time autonomous discovery, and continuous data curation to accelerate catalyst design. Together, these insights map a road map for researchers across computational chemistry and computer science to advance efficient, scalable, and interpretable AI-driven catalyst discovery with practical industrial relevance.
Abstract
Catalysts are essential for accelerating chemical reactions and enhancing selectivity, which is crucial for the sustainable production of energy, materials, and bioactive compounds. Catalyst discovery is fundamental yet challenging in computational chemistry and has garnered significant attention due to the promising performance of advanced Artificial Intelligence (AI) techniques. The development of Large Language Models (LLMs) notably accelerates progress in the discovery of both homogeneous and heterogeneous catalysts, where their chemical reactions differ significantly in material phases, temperature, dynamics, etc. However, there is currently no comprehensive survey that discusses the progress and latest developments in both areas, particularly with the application of LLM techniques. To address this gap, this paper presents a thorough and systematic survey of AI-empowered catalyst discovery, employing a unified and general categorization for homogeneous and heterogeneous catalysts. We examine the progress of AI-empowered catalyst discovery, highlighting their individual advantages and disadvantages, and discuss the challenges faced in this field. Furthermore, we suggest potential directions for future research from the perspective of computer science. Our goal is to assist researchers in computational chemistry, computer science, and related fields in easily tracking the latest advancements, providing a clear overview and roadmap of this area. We also organize and make accessible relevant resources, including article lists and datasets, in an open repository at https://github.com/LuckyGirl-XU/Awesome-Artificial-Intelligence-Empowered-Catalyst-Discovery.
