CataLM: Empowering Catalyst Design Through Large Language Models
Ludi Wang, Xueqing Chen, Yi Du, Yuanchun Zhou, Yang Gao, Wenjuan Cui
TL;DR
CataLM addresses the need for catalyst-domain AI tools by building a domain-specific LLM for electrocatalytic materials. It employs domain pre-training on a large corpus of open-access electrocatalysis literature and instruction tuning with expert-annotated data, augmented by retrieval-augmentation to support precise knowledge extraction and design tasks. The model demonstrates competitive performance on named-entity recognition and catalyst control-method recommendations, with expert validation suggesting improved, domain-informed outputs over generic LLMs. Open-source release and planned downstream platforms aim to accelerate human–AI collaboration in catalyst discovery and development.
Abstract
The field of catalysis holds paramount importance in shaping the trajectory of sustainable development, prompting intensive research efforts to leverage artificial intelligence (AI) in catalyst design. Presently, the fine-tuning of open-source large language models (LLMs) has yielded significant breakthroughs across various domains such as biology and healthcare. Drawing inspiration from these advancements, we introduce CataLM Cata}lytic Language Model), a large language model tailored to the domain of electrocatalytic materials. Our findings demonstrate that CataLM exhibits remarkable potential for facilitating human-AI collaboration in catalyst knowledge exploration and design. To the best of our knowledge, CataLM stands as the pioneering LLM dedicated to the catalyst domain, offering novel avenues for catalyst discovery and development.
