MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning

Heewoong Noh; Gyoung S. Na; Namkyeong Lee; Chanyoung Park

MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning

Heewoong Noh, Gyoung S. Na, Namkyeong Lee, Chanyoung Park

TL;DR

MSP-LLM tackles the complete Material Synthesis Planning problem by decomposing it into precursor prediction (PP) and synthesis operation prediction (SOP), and by introducing a discrete material class variable $G$ to create a disciplined decision chain. It uses independently fine-tuned LLMs for PP and SOP, augmented with hierarchical precursor types and a precursor constraint factorization (PCF) to explicitly preserve precursor information in decoding, enabling coherent end-to-end MSP from target compositions. Experiments on real inorganic synthesis data show that MSP-LLM outperforms baselines on PP, SOP, and the full MSP task, with the complete MSP Top-10 accuracy reaching 18.56% (Split 1) and 23.23% (Split 2). An information-theoretic analysis clarifies why PCF reduces irreducible uncertainty and improves precursor-aware operation generation, supporting the framework’s robustness and practical potential for accelerating materials discovery.

Abstract

Material synthesis planning (MSP) remains a fundamental and underexplored bottleneck in AI-driven materials discovery, as it requires not only identifying suitable precursor materials but also designing coherent sequences of synthesis operations to realize a target material. Although several AI-based approaches have been proposed to address isolated subtasks of MSP, a unified methodology for solving the entire MSP task has yet to be established. We propose MSP-LLM, a unified LLM-based framework that formulates MSP as a structured process composed of two constituent subproblems: precursor prediction (PP) and synthesis operation prediction (SOP). Our approach introduces a discrete material class as an intermediate decision variable that organizes both tasks into a chemically consistent decision chain. For OP, we further incorporate hierarchical precursor types as synthesis-relevant inductive biases and employ an explicit conditioning strategy that preserves precursor-related information in the autoregressive decoding state. Extensive experiments show that MSP-LLM consistently outperforms existing methods on both PP and SOP, as well as on the complete MSP task, demonstrating an effective and scalable framework for MSP that can accelerate real-world materials discovery.

MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning

TL;DR

to create a disciplined decision chain. It uses independently fine-tuned LLMs for PP and SOP, augmented with hierarchical precursor types and a precursor constraint factorization (PCF) to explicitly preserve precursor information in decoding, enabling coherent end-to-end MSP from target compositions. Experiments on real inorganic synthesis data show that MSP-LLM outperforms baselines on PP, SOP, and the full MSP task, with the complete MSP Top-10 accuracy reaching 18.56% (Split 1) and 23.23% (Split 2). An information-theoretic analysis clarifies why PCF reduces irreducible uncertainty and improves precursor-aware operation generation, supporting the framework’s robustness and practical potential for accelerating materials discovery.

Abstract

Paper Structure (27 sections, 4 equations, 3 figures, 10 tables)

This paper contains 27 sections, 4 equations, 3 figures, 10 tables.

Introduction
Related Works
Preliminaries
Proposed Method
Decision Chain via Discrete Material Class Variable
Precursor Prediction (PP) Stage
Synthesis Operation Prediction (SOP) Stage
Challenges of Precursor Utilization
Hierarchical Precursor Types
Precursor Constraint Factorization
A Unified Framework for Complete MSP
Why Precursor Constraint Factorization Works: An Information-Theoretic View.
Experiments
Precursor Prediction (PP) Task
Synthesis Operation Prediction (SOP) Task
...and 12 more sections

Figures (3)

Figure 1: Overall framework of MSP-LLM. Given a target material, MSP-LLM first predicts the precursor set (PP), and then, conditioned on the predicted precursors, generates the synthesis operation sequence (SOP), resulting in a complete MSP. The input–output texts above the PP and SOP LLMs illustrate example prompts used during LLM fine-tuning for each task.
Figure 2: SOP performance under target-only, implicit, and explicit conditioning across two fine-tuned backbone LLMs.
Figure 3: Effect of the material group decision chain on performance(NED) across 11 material classes.

MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning

TL;DR

Abstract

MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)