Table of Contents
Fetching ...

Large language models in materials science and the need for open-source approaches

Fengxu Yang, Weitong Chen, Jack D. Evans

TL;DR

This paper surveys how large language models are transforming materials science across data extraction, predictive modeling, and autonomous experimentation. It contrasts closed-source and open-source LLMs, showing that open-source ecosystems can match performance while enhancing transparency, reproducibility, and data privacy. Concrete advances include sequence-aware data extraction and platform-level data integration (e.g., MOF-ChemUnity, Material String encoding) and the emergence of agent-based discovery workflows (ChatMOF, Coscientist, ChemAgents, MOFGen). It also highlights challenges in evaluating autonomous, multi-step reasoning systems and calls for standardized benchmarks and open data to foster trustworthy, scalable AI-driven discovery in materials science.

Abstract

Large language models (LLMs) are rapidly transforming materials science. This review examines recent LLM applications across the materials discovery pipeline, focusing on three key areas: mining scientific literature , predictive modelling, and multi-agent experimental systems. We highlight how LLMs extract valuable information such as synthesis conditions from text, learn structure-property relationships, and can coordinate agentic systems integrating computational tools and laboratory automation. While progress has been largely dependent on closed-source commercial models, our benchmark results demonstrate that open-source alternatives can match performance while offering greater transparency, reproducibility, cost-effectiveness, and data privacy. As open-source models continue to improve, we advocate their broader adoption to build accessible, flexible, and community-driven AI platforms for scientific discovery.

Large language models in materials science and the need for open-source approaches

TL;DR

This paper surveys how large language models are transforming materials science across data extraction, predictive modeling, and autonomous experimentation. It contrasts closed-source and open-source LLMs, showing that open-source ecosystems can match performance while enhancing transparency, reproducibility, and data privacy. Concrete advances include sequence-aware data extraction and platform-level data integration (e.g., MOF-ChemUnity, Material String encoding) and the emergence of agent-based discovery workflows (ChatMOF, Coscientist, ChemAgents, MOFGen). It also highlights challenges in evaluating autonomous, multi-step reasoning systems and calls for standardized benchmarks and open data to foster trustworthy, scalable AI-driven discovery in materials science.

Abstract

Large language models (LLMs) are rapidly transforming materials science. This review examines recent LLM applications across the materials discovery pipeline, focusing on three key areas: mining scientific literature , predictive modelling, and multi-agent experimental systems. We highlight how LLMs extract valuable information such as synthesis conditions from text, learn structure-property relationships, and can coordinate agentic systems integrating computational tools and laboratory automation. While progress has been largely dependent on closed-source commercial models, our benchmark results demonstrate that open-source alternatives can match performance while offering greater transparency, reproducibility, cost-effectiveness, and data privacy. As open-source models continue to improve, we advocate their broader adoption to build accessible, flexible, and community-driven AI platforms for scientific discovery.

Paper Structure

This paper contains 4 sections, 5 figures.

Figures (5)

  • Figure 1: The MOF-ChemUnity workflow. LLM are used to link publications and CSD entries by extracting experimental properties and matching compound names across literature and structure files. This structured data populates the knowledge graph which combines synthesis, applications and so on. Reproduced from ref. pruyn_mof-chemunity_2025, licensed under CC BY-NC 4.0.
  • Figure 2: Performance benchmark of open-source LLMs on the MOF-ChemUnity synthesis conditions extraction task. Model performance is plotted across three key dimensions: accuracy (%), average inference time (s/task), and estimated VRAM (GB) usage under bfloat16 precision (color scale). Note that Qwen3 models exhibit significantly higher average inference times compared to similarly sized models. This is attributed to their reasoning process prior to output generation. All models were evaluated with thinking mode enabled.
  • Figure 3: Framework for predicting material synthesisability and synthesis routes. (a) Material string encoding structural data is used to train a (c) "Synthesizability LLM". Reproduced from ref. song_accurate_2025, licensed under CC BY-NC-ND 4.0.
  • Figure 4: Comparison of synthesis condition recommendation scores (median per-sample) for different finetuned open-source models. The result for GPT-4o (left) is the reported score from the original study.
  • Figure 5: A comprehensive system featuring a central LLM-based "Planner" that orchestrates and manages the entire research workflow. Reproduced from ref. boiko_autonomous_2023, licensed under CC BY 4.0.