Table of Contents
Fetching ...

MatLLMSearch: Crystal Structure Discovery with Evolution-Guided Large Language Models

Jingru Gan, Peichen Zhong, Yuanqi Du, Yanqiao Zhu, Chenru Duan, Haorui Wang, Daniel Schwalbe-Koda, Carla P. Gomes, Kristin A. Persson, Wei Wang

TL;DR

MatLLMSearch tackles crystal structure generation and CSP by using pre-trained LLMs as intelligent proposal agents within an evolutionary loop, avoiding domain-specific fine-tuning. It couples LLM-driven implicit crossover and mutation with universal MLIPs (CHGNet and M3GNet) for fast stability screening and final DFT verification, reporting metastability rates around $E_d<0.1$ eV/atom of about 76.8% and approximately 31.7% of structures stable by DFT verification; the method also supports multi-objective optimization of properties such as bulk modulus. The framework demonstrates versatility across crystal structure design and structure prediction tasks while maintaining diversity and novelty, all in a training-free setting that reduces overhead. Overall, MatLLMSearch provides a scalable, accessible route for high-throughput materials discovery by leveraging existing knowledge embedded in large language models.

Abstract

Crystal structure generation is fundamental to materials science, enabling the discovery of novel materials with desired properties. While existing approaches leverage Large Language Models (LLMs) through extensive fine-tuning on materials databases, we show that pre-trained LLMs can inherently generate novel and stable crystal structures without additional fine-tuning. Our framework employs LLMs as intelligent proposal agents within an evolutionary pipeline that guides them to perform implicit crossover and mutation operations while maintaining chemical validity. We demonstrate that MatLLMSearch achieves a 78.38% metastable rate validated by machine learning interatomic potentials and 31.7% DFT-verified stability, outperforming specialized models such as CrystalTextLLM. Beyond crystal structure generation, we further demonstrate that our framework adapts to diverse materials design tasks, including crystal structure prediction and multi-objective optimization of properties such as deformation energy and bulk modulus, all without fine-tuning. These results establish our framework as a versatile and effective framework for consistent high-quality materials discovery, offering training-free generation of novel stable structures with reduced overhead and broader accessibility.

MatLLMSearch: Crystal Structure Discovery with Evolution-Guided Large Language Models

TL;DR

MatLLMSearch tackles crystal structure generation and CSP by using pre-trained LLMs as intelligent proposal agents within an evolutionary loop, avoiding domain-specific fine-tuning. It couples LLM-driven implicit crossover and mutation with universal MLIPs (CHGNet and M3GNet) for fast stability screening and final DFT verification, reporting metastability rates around eV/atom of about 76.8% and approximately 31.7% of structures stable by DFT verification; the method also supports multi-objective optimization of properties such as bulk modulus. The framework demonstrates versatility across crystal structure design and structure prediction tasks while maintaining diversity and novelty, all in a training-free setting that reduces overhead. Overall, MatLLMSearch provides a scalable, accessible route for high-throughput materials discovery by leveraging existing knowledge embedded in large language models.

Abstract

Crystal structure generation is fundamental to materials science, enabling the discovery of novel materials with desired properties. While existing approaches leverage Large Language Models (LLMs) through extensive fine-tuning on materials databases, we show that pre-trained LLMs can inherently generate novel and stable crystal structures without additional fine-tuning. Our framework employs LLMs as intelligent proposal agents within an evolutionary pipeline that guides them to perform implicit crossover and mutation operations while maintaining chemical validity. We demonstrate that MatLLMSearch achieves a 78.38% metastable rate validated by machine learning interatomic potentials and 31.7% DFT-verified stability, outperforming specialized models such as CrystalTextLLM. Beyond crystal structure generation, we further demonstrate that our framework adapts to diverse materials design tasks, including crystal structure prediction and multi-objective optimization of properties such as deformation energy and bulk modulus, all without fine-tuning. These results establish our framework as a versatile and effective framework for consistent high-quality materials discovery, offering training-free generation of novel stable structures with reduced overhead and broader accessibility.

Paper Structure

This paper contains 38 sections, 1 equation, 14 figures, 9 tables, 1 algorithm.

Figures (14)

  • Figure 1: The workflow of MatLLMSearch for crystal structure generation. Starting from an initial population of known structures, our framework iteratively evolves new crystal structures through LLM-guided reproduction, evaluation, and selection.
  • Figure 2: (a) Pareto frontiers of bulk modulus versus decomposition energy ($E_\text{d}$) for structures optimized towards stability, bulk modulus and multi-objective (multi-turn). Ellipses indicate regions of highest structure density. (b) Examples of predicted crystal structures with composition Na3AlCl6.
  • Figure 3: Element co-occurrence patterns with fluorine (F) in LLM-proposed structures (left) versus MatBench structures (right). Bubble size indicates frequency of occurrence for each element pair, while color intensity represents compositional diversity (darker indicates more unique compositions with that element pair).
  • Figure 4: Decomposition energy ($E_\text{d}$) distribution comparison across experimental configurations. Vertical lines indicate metastable thresholds at 0.0 eV/atom (stable) and 0.1 eV/atom (metastable). Reference-guided approaches show more balanced distributions.
  • Figure 5: Ablation analysis comparing reference-guided (Stability) vs. reference-free (noref_iter) generation. (a) $E_\text{d}$ distributions across iterations. (b,c) Space group diversity: reference structures enable broad space group diversity with high metastability (b); w/o references collapse to space group 1 (c).
  • ...and 9 more figures