Table of Contents
Fetching ...

Foundational Large Language Models for Materials Research

Vaibhav Mishra, Somaditya Singh, Dhruv Ahlawat, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, Mausam, N. M. Anoop Krishnan

TL;DR

LLaMat introduces domain-adapted foundation models for materials science by three-stage development: continued pretraining on a materials-focused corpus with CIF data, followed by instruction tuning and task-specific finetuning. The two main variants, LLaMat-Chat and LLaMat-CIF, demonstrate strong performance across MatNLP, MatSIE, and crystal-structure generation benchmarks, often outperforming larger general-purpose LLMs. A notable finding is 'adaptation rigidity,' where some larger pre-trained models (e.g., LLaMA-3) underperform relative to smaller, domain-adapted counterparts on certain tasks, highlighting the nuanced relationship between pretraining scale and domain adaptation. Together, these results support the feasibility of deployable, specialized AI copilots for materials research and offer guidance on model selection, training methodology, and domain-specific performance considerations for scientific AI systems.

Abstract

Materials discovery and development are critical for addressing global challenges. Yet, the exponential growth in materials science literature comprising vast amounts of textual data has created significant bottlenecks in knowledge extraction, synthesis, and scientific reasoning. Large Language Models (LLMs) offer unprecedented opportunities to accelerate materials research through automated analysis and prediction. Still, their effective deployment requires domain-specific adaptation for understanding and solving domain-relevant tasks. Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models on an extensive corpus of materials literature and crystallographic data. Through systematic evaluation, we demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities. The specialized LLaMat-CIF variant demonstrates unprecedented capabilities in crystal structure generation, predicting stable crystals with high coverage across the periodic table. Intriguingly, despite LLaMA-3's superior performance in comparison to LLaMA-2, we observe that LLaMat-2 demonstrates unexpectedly enhanced domain-specific performance across diverse materials science tasks, including structured information extraction from text and tables, more particularly in crystal structure generation, a potential adaptation rigidity in overtrained LLMs. Altogether, the present work demonstrates the effectiveness of domain adaptation towards developing practically deployable LLM copilots for materials research. Beyond materials science, our findings reveal important considerations for domain adaptation of LLMs, such as model selection, training methodology, and domain-specific performance, which may influence the development of specialized scientific AI systems.

Foundational Large Language Models for Materials Research

TL;DR

LLaMat introduces domain-adapted foundation models for materials science by three-stage development: continued pretraining on a materials-focused corpus with CIF data, followed by instruction tuning and task-specific finetuning. The two main variants, LLaMat-Chat and LLaMat-CIF, demonstrate strong performance across MatNLP, MatSIE, and crystal-structure generation benchmarks, often outperforming larger general-purpose LLMs. A notable finding is 'adaptation rigidity,' where some larger pre-trained models (e.g., LLaMA-3) underperform relative to smaller, domain-adapted counterparts on certain tasks, highlighting the nuanced relationship between pretraining scale and domain adaptation. Together, these results support the feasibility of deployable, specialized AI copilots for materials research and offer guidance on model selection, training methodology, and domain-specific performance considerations for scientific AI systems.

Abstract

Materials discovery and development are critical for addressing global challenges. Yet, the exponential growth in materials science literature comprising vast amounts of textual data has created significant bottlenecks in knowledge extraction, synthesis, and scientific reasoning. Large Language Models (LLMs) offer unprecedented opportunities to accelerate materials research through automated analysis and prediction. Still, their effective deployment requires domain-specific adaptation for understanding and solving domain-relevant tasks. Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models on an extensive corpus of materials literature and crystallographic data. Through systematic evaluation, we demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities. The specialized LLaMat-CIF variant demonstrates unprecedented capabilities in crystal structure generation, predicting stable crystals with high coverage across the periodic table. Intriguingly, despite LLaMA-3's superior performance in comparison to LLaMA-2, we observe that LLaMat-2 demonstrates unexpectedly enhanced domain-specific performance across diverse materials science tasks, including structured information extraction from text and tables, more particularly in crystal structure generation, a potential adaptation rigidity in overtrained LLMs. Altogether, the present work demonstrates the effectiveness of domain adaptation towards developing practically deployable LLM copilots for materials research. Beyond materials science, our findings reveal important considerations for domain adaptation of LLMs, such as model selection, training methodology, and domain-specific performance, which may influence the development of specialized scientific AI systems.

Paper Structure

This paper contains 48 sections, 1 equation, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Development pipeline and capabilities of LLaMat for MatSci applications. The schematic illustrates the two-stage development of LLaMat, beginning with continuous pretraining on MatSci corpora (top), followed by specialized instruction finetuning pathways (left and right). The pretraining dataset composition is shown in the pie chart, comprising peer-reviewed publications (94.43%), crystallographic information files (CIF, 2.50%), and a subset of RedPajama (3.051%). Two distinct finetuning pathways yield LLaMat-Chat, a materials research copilot capable of structured information extraction and materials NLP tasks (left branch), and LLaMat-CIF, specialized in crystal structure analysis and generation (right branch). Representative examples demonstrate the dataset details and model's capabilities in handling diverse MatSci queries and tasks.
  • Figure 2: Comparative performance analysis of LLaMat and LLaMA models across MatSci and general language tasks with closed source models: Claude and Gemini. LLaMA-FT models correspond to the meta-LLaMA models finetuned on our training corpus a, Micro-F1, and b, Macro-F1 scores demonstrate performance on MatSci tasks. c, Radar plot illustrating task-specific performance across diverse MatSci applications, including entity recognition, relation extraction, and classification tasks. Only the top models from each family are included in the radar plot, LLaMat-3-chat, LLaMat-2-chat, Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For MatSci tasks, higher scores indicate better performance in extracting domain-specific information, identifying relationships between materials entities, and classifying scientific text. Results demonstrate that domain-specific pretraining enhances MatSci task performance while preserving general language capabilities.
  • Figure 3: Performance evaluation of structured information extraction capabilities across MatSci subdomains. a, Bar plot showing mean F1 score across all our structured information extraction tasks in doping, metal-organic-frameworks, and general material science, b, Radar plot for F1-score across all relation extraction tasks c, Bar plot showing mean accuracy over all material science table data extraction tasks, d, Radar plot showing F1-score for individual tasks in table data extraction. Only the top models from each family are included in the radar plots, LLaMat-3-chat, LLaMat-2-chat, Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o
  • Figure 4: Comparative compositional and structural analysis of 10,000 crystal structures generated by LLaMat-2-CIF model and their relaxed counterparts.a, Energy per atom (eV/atom); b, Number of elements in each crystal structure. The inset shows the number of crystals with the unique number of elements; c, The distribution of Bravais lattice systems; d, Lattice parameters (unit cell lengths a, b, and c along x, y, and z-axes; e, Lattice parameters ($\alpha$, $\beta$, and $\gamma$, i.e., the angles between b and c, a and c, and a and b; f, Periodic table heat map visualizing elemental frequency, where color intensity represents generation frequency. Grey cells indicate elements absent in generated structures.
  • Figure A.1: Distribution of the Bravais lattice of CIF training dataset.
  • ...and 3 more figures