Table of Contents
Fetching ...

MatExpert: Decomposing Materials Discovery by Mimicking Human Experts

Qianggang Ding, Santiago Miret, Bang Liu

TL;DR

A novel framework that leverages Large Language Models (LLMs) and contrastive learning to accelerate the discovery and design of new solid-state materials, MatExpert is introduced, which outperforms state-of-the-art methods in material generation tasks.

Abstract

Material discovery is a critical research area with profound implications for various industries. In this work, we introduce MatExpert, a novel framework that leverages Large Language Models (LLMs) and contrastive learning to accelerate the discovery and design of new solid-state materials. Inspired by the workflow of human materials design experts, our approach integrates three key stages: retrieval, transition, and generation. First, in the retrieval stage, MatExpert identifies an existing material that closely matches the desired criteria. Second, in the transition stage, MatExpert outlines the necessary modifications to transform this material formulation to meet specific requirements outlined by the initial user query. Third, in the generation state, MatExpert performs detailed computations and structural generation to create new materials based on the provided information. Our experimental results demonstrate that MatExpert outperforms state-of-the-art methods in material generation tasks, achieving superior performance across various metrics including validity, distribution, and stability. As such, MatExpert represents a meaningful advancement in computational material discovery using langauge-based generative models.

MatExpert: Decomposing Materials Discovery by Mimicking Human Experts

TL;DR

A novel framework that leverages Large Language Models (LLMs) and contrastive learning to accelerate the discovery and design of new solid-state materials, MatExpert is introduced, which outperforms state-of-the-art methods in material generation tasks.

Abstract

Material discovery is a critical research area with profound implications for various industries. In this work, we introduce MatExpert, a novel framework that leverages Large Language Models (LLMs) and contrastive learning to accelerate the discovery and design of new solid-state materials. Inspired by the workflow of human materials design experts, our approach integrates three key stages: retrieval, transition, and generation. First, in the retrieval stage, MatExpert identifies an existing material that closely matches the desired criteria. Second, in the transition stage, MatExpert outlines the necessary modifications to transform this material formulation to meet specific requirements outlined by the initial user query. Third, in the generation state, MatExpert performs detailed computations and structural generation to create new materials based on the provided information. Our experimental results demonstrate that MatExpert outperforms state-of-the-art methods in material generation tasks, achieving superior performance across various metrics including validity, distribution, and stability. As such, MatExpert represents a meaningful advancement in computational material discovery using langauge-based generative models.

Paper Structure

This paper contains 36 sections, 1 equation, 9 figures, 5 tables.

Figures (9)

  • Figure 1: MatExpert achieves remarkable performance on all metrics compared with baselines, especially metrics of distances. See Table \ref{['tab:comparison']} for details.
  • Figure 2: We utilize a contrastive learning framework to train two encoders for reference material retrieval. For a given sample material (e.g., Na3MnCoNiO6), we extract both its property description and structural description using https://pymatgen.org/ong2013python and https://github.com/hackingmaterials/robocrystallographerganose2019robocrystallographer, respectively. The model employs two T5-based encoders raffel2020exploring, which are trained to minimize the distance between these two representations.
  • Figure 3: Pipeline of MatExpert: Given a description of the desired material, MatExpert first retrieves the most similar material from the database (e.g., $\text{Na}_3\text{MnCoNiO}_6$). Next, the LLM provides transition pathways to modify the retrieved material into the desired material (e.g., replacing Na with Li). Finally, the LLM generates the detailed structural information of the desired material ($\text{Li}_3\text{MnCoNiO}_6$). See Figure \ref{['fig:case']} for a full case of conditional material generation.
  • Figure 4: Conditional satisfaction rates of common property constraints. MatExpert consistently outperforms baseline methods.
  • Figure 5: Scores of diversty and novelty normalized to testing samples. MatExpert consistently achieves remarkable scores on all metrics. Notably, compared with Crystal-LLM, MatExpert consistently maintains high novelty regardless of model size.
  • ...and 4 more figures