Table of Contents
Fetching ...

LLMatDesign: Autonomous Materials Discovery with Large Language Models

Shuyi Jia, Chao Zhang, Victor Fung

TL;DR

LLMatDesign introduces an autonomous, language-based framework for materials discovery that uses large language models as reasoning engines to select and apply material modifications, validate properties with surrogate predictors, and refine decisions through self-reflection. The approach enables rapid, zero-shot adaptation to new tasks and constraints without extensive training data, achieving target properties more efficiently than baselines. Thorough evaluations show that prompt optimization and self-reflection substantially boost performance, and that the framework can comply with design constraints, pointing toward future integration with autonomous laboratories and multimodal material representations. Overall, the work demonstrates a promising direction for AI-driven, data-efficient materials design leveraging LLMs and automated tooling.

Abstract

Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials, but these methods still depend heavily on very large quantities of training data and often lack the flexibility and chemical understanding often desired in materials discovery. We introduce LLMatDesign, a novel language-based framework for interpretable materials design powered by large language models (LLMs). LLMatDesign utilizes LLM agents to translate human instructions, apply modifications to materials, and evaluate outcomes using provided tools. By incorporating self-reflection on its previous decisions, LLMatDesign adapts rapidly to new tasks and conditions in a zero-shot manner. A systematic evaluation of LLMatDesign on several materials design tasks, in silico, validates LLMatDesign's effectiveness in developing new materials with user-defined target properties in the small data regime. Our framework demonstrates the remarkable potential of autonomous LLM-guided materials discovery in the computational setting and towards self-driving laboratories in the future.

LLMatDesign: Autonomous Materials Discovery with Large Language Models

TL;DR

LLMatDesign introduces an autonomous, language-based framework for materials discovery that uses large language models as reasoning engines to select and apply material modifications, validate properties with surrogate predictors, and refine decisions through self-reflection. The approach enables rapid, zero-shot adaptation to new tasks and constraints without extensive training data, achieving target properties more efficiently than baselines. Thorough evaluations show that prompt optimization and self-reflection substantially boost performance, and that the framework can comply with design constraints, pointing toward future integration with autonomous laboratories and multimodal material representations. Overall, the work demonstrates a promising direction for AI-driven, data-efficient materials design leveraging LLMs and automated tooling.

Abstract

Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials, but these methods still depend heavily on very large quantities of training data and often lack the flexibility and chemical understanding often desired in materials discovery. We introduce LLMatDesign, a novel language-based framework for interpretable materials design powered by large language models (LLMs). LLMatDesign utilizes LLM agents to translate human instructions, apply modifications to materials, and evaluate outcomes using provided tools. By incorporating self-reflection on its previous decisions, LLMatDesign adapts rapidly to new tasks and conditions in a zero-shot manner. A systematic evaluation of LLMatDesign on several materials design tasks, in silico, validates LLMatDesign's effectiveness in developing new materials with user-defined target properties in the small data regime. Our framework demonstrates the remarkable potential of autonomous LLM-guided materials discovery in the computational setting and towards self-driving laboratories in the future.
Paper Structure (22 sections, 26 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 26 figures, 6 tables, 1 algorithm.

Figures (26)

  • Figure 1: Overview of LLMatDesign. The discovery process with LLMatDesign begins with user-provided inputs of chemical composition and target property. It recommends modifications (addition, removal, substitution, or exchange), and uses machine learning tools for structure relaxation and property prediction. Driven by an LLM, this iterative process continues until the target property is achieved, with self-reflection on past modifications fed back into the decision-making process at each step.
  • Figure 2: Prompt template for LLMatDesign with GPT-4o. Text placeholders in red angular brackets are specific to the task given to LLMatDesign. Text placeholders in blue angular brackets are optional and can be omitted if not needed. For Gemini-1.0-pro's prompt template, see Appendix \ref{['sec:appendix-prompts']}.
  • Figure 3: Prompt template for self-reflection. Text placeholders in red angular brackets are specific to the task given to LLMatDesign.
  • Figure 4: Average band gaps and formation energies over 50 modifications. The grey horizontal line indicates the target band gap of 1.4 eV. The colored dots on the x-axis indicate the average number of modifications taken for each method to reach the target. For formation energy, the goal is to achieve the lowest possible value.
  • Figure 6: Example of LLMatDesign with GPT-4o on the task of modifying the starting material $\text{CdCu}_2\text{GeS}_4$ to achieve a band gap of 1.40 eV. The starting material is retrieved from the Materials Project with chemical formula $\text{Cd}_2\text{Cu}_4\text{Ge}_2\text{S}_8$.
  • ...and 21 more figures