LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations
Lei Shi, Zhimeng Liu, Yi Yang, Weize Wu, Yuyang Zhang, Hongbo Zhang, Jing Lin, Siyu Wu, Zihan Chen, Ruiming Li, Nan Wang, Zipeng Liu, Huobin Tan, Hongyi Gao, Yue Zhang, Ge Wang
TL;DR
The paper presents a few-shot in-context learning framework for extracting MOF synthesis conditions from the literature, powered by human–AI interactive data curation and retrieval-augmented generation to select 4–6 best demonstrations per paragraph. By incorporating external material knowledge through prompt engineering and post-processing, the approach achieves high extraction accuracy across multiple MOF datasets, outperforming zero-shot LLM baselines. The work demonstrates strong downstream impact in MOF design, showing real-world lab syntheses guided by LLM extractions can reach top-quality performance (e.g., BET surface area surpassing 91.1% of literature samples) and enables scalable, high-throughput literature processing via an online engine and database. Overall, the method reduces data-labeling costs compared with fine-tuning while enabling accurate synthesis-route extraction and material inference for structural and property design.
Abstract
The extraction of Metal-Organic Frameworks (MOFs) synthesis route from literature has been crucial for the logical MOFs design with desirable functionality. The recent advent of large language models (LLMs) provides disruptively new solution to this long-standing problem. While the latest researches mostly stick to primitive zero-shot LLMs lacking specialized material knowledge, we introduce in this work the few-shot LLM in-context learning paradigm. First, a human-AI interactive data curation approach is proposed to secure high-quality demonstrations. Second, an information retrieval algorithm is applied to pick and quantify few-shot demonstrations for each extraction. Over three datasets randomly sampled from nearly 90,000 well-defined MOFs, we conduct triple evaluations to validate our method. The synthesis extraction, structure inference, and material design performance of the proposed few-shot LLMs all significantly outplay zero-shot LLM and baseline methods. The lab-synthesized material guided by LLM surpasses 91.1% high-quality MOFs of the same class reported in the literature, on the key physical property of specific surface area.
