Table of Contents
Fetching ...

Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

Heegyu Kim, Taeyang Jeon, Seungtaek Choi, Ji Hoon Hong, Dong Won Jeon, Ga-Yeon Baek, Gyeong-Won Kwak, Dong-Hee Lee, Jisu Bae, Chihoon Lee, Yunseo Kim, Seon-Jin Choi, Jin-Seong Park, Sung Beom Cho, Hyunsouk Cho

TL;DR

The paper tackles inefficiencies in materials synthesis by creating Open Materials Guide (OMG), a large-scale, expert-verified dataset of 17K synthesis recipes, and AlchemyBench, an end-to-end benchmark for predicting synthesis components and outcomes. It introduces an LLM-as-a-Judge evaluation framework that demonstrates strong alignment with human expert judgments, enabling scalable benchmarking. OMG is built from 28,685 open-access articles with an LLM-assisted extraction pipeline that yields complete recipes across 10 synthesis methods, validated by eight domain experts. Experiments show that reasoning-based LLMs and retrieval-augmented generation improve synthesis predictions, and that LLM-based scoring correlates better with expert assessments than traditional metrics, supporting scalable, automated evaluation and moving toward fully automated materials discovery. Overall, the work provides a practical data resource and rigorous evaluation framework to accelerate data-driven materials science.

Abstract

Materials synthesis is vital for innovations such as energy storage, catalysis, electronics, and biomedical devices. Yet, the process relies heavily on empirical, trial-and-error methods guided by expert intuition. Our work aims to support the materials science community by providing a practical, data-driven resource. We have curated a comprehensive dataset of 17K expert-verified synthesis recipes from open-access literature, which forms the basis of our newly developed benchmark, AlchemyBench. AlchemyBench offers an end-to-end framework that supports research in large language models applied to synthesis prediction. It encompasses key tasks, including raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. We propose an LLM-as-a-Judge framework that leverages large language models for automated evaluation, demonstrating strong statistical agreement with expert assessments. Overall, our contributions offer a supportive foundation for exploring the capabilities of LLMs in predicting and guiding materials synthesis, ultimately paving the way for more efficient experimental design and accelerated innovation in materials science.

Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

TL;DR

The paper tackles inefficiencies in materials synthesis by creating Open Materials Guide (OMG), a large-scale, expert-verified dataset of 17K synthesis recipes, and AlchemyBench, an end-to-end benchmark for predicting synthesis components and outcomes. It introduces an LLM-as-a-Judge evaluation framework that demonstrates strong alignment with human expert judgments, enabling scalable benchmarking. OMG is built from 28,685 open-access articles with an LLM-assisted extraction pipeline that yields complete recipes across 10 synthesis methods, validated by eight domain experts. Experiments show that reasoning-based LLMs and retrieval-augmented generation improve synthesis predictions, and that LLM-based scoring correlates better with expert assessments than traditional metrics, supporting scalable, automated evaluation and moving toward fully automated materials discovery. Overall, the work provides a practical data resource and rigorous evaluation framework to accelerate data-driven materials science.

Abstract

Materials synthesis is vital for innovations such as energy storage, catalysis, electronics, and biomedical devices. Yet, the process relies heavily on empirical, trial-and-error methods guided by expert intuition. Our work aims to support the materials science community by providing a practical, data-driven resource. We have curated a comprehensive dataset of 17K expert-verified synthesis recipes from open-access literature, which forms the basis of our newly developed benchmark, AlchemyBench. AlchemyBench offers an end-to-end framework that supports research in large language models applied to synthesis prediction. It encompasses key tasks, including raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. We propose an LLM-as-a-Judge framework that leverages large language models for automated evaluation, demonstrating strong statistical agreement with expert assessments. Overall, our contributions offer a supportive foundation for exploring the capabilities of LLMs in predicting and guiding materials synthesis, ultimately paving the way for more efficient experimental design and accelerated innovation in materials science.

Paper Structure

This paper contains 36 sections, 1 equation, 14 figures, 10 tables.

Figures (14)

  • Figure 1: An overview of our contributions, featuring the Open Materials Guide Dataset for large-scale synthesis recipes and AlchemyBench for scalable, expert-level evaluation.
  • Figure 2: The periodic table (left) demonstrates that OMG covers diverse elements used in target materials, with darker colors indicating higher usage frequencies. A pie chart (right) illustrates the diversity of synthesis methods, highlighting the contributions of prior studies (white) and our dataset (white + green).
  • Figure 3: An example of extracted recipe from zhao2020synthesis demonstrates structured annotation of materials, equipment, procedures, and characterization methods.
  • Figure 4: Impact of the Retrieval Augmented Generation (RAG) in High Impact set.
  • Figure 5: Yearly distribution of collected material synthesis papers
  • ...and 9 more figures