Are LLMs Ready for Real-World Materials Discovery?
Santiago Miret, N M Anoop Krishnan
TL;DR
This article analyzes the current limitations of applying large language models to real-world materials science (MatSci). It argues that to realize practical impact, MatSci-LLMs must be grounded in domain knowledge, capable of hypothesis generation and testing, and integrated with multi-modal, expertly curated data. The authors detail failure cases, key development challenges, and a multi-modal data-building strategy, then present a structured roadmap for end-to-end MatSci-LLM–driven discovery. They emphasize transparent, accountable deployment with ethical considerations and collaboration across publishers, industry, and academia. Collectively, the work outlines a path toward automated knowledge generation, in-silico design, and self-driving materials laboratories while noting significant hurdles to overcome.
Abstract
Large Language Models (LLMs) create exciting possibilities for powerful language processing tools to accelerate research in materials science. While LLMs have great potential to accelerate materials understanding and discovery, they currently fall short in being practical materials science tools. In this position paper, we show relevant failure cases of LLMs in materials science that reveal current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given those shortcomings, we outline a framework for developing Materials Science LLMs (MatSci-LLMs) that are grounded in materials science knowledge and hypothesis generation followed by hypothesis testing. The path to attaining performant MatSci-LLMs rests in large part on building high-quality, multi-modal datasets sourced from scientific literature where various information extraction challenges persist. As such, we describe key materials science information extraction challenges which need to be overcome in order to build large-scale, multi-modal datasets that capture valuable materials science knowledge. Finally, we outline a roadmap for applying future MatSci-LLMs for real-world materials discovery via: 1. Automated Knowledge Base Generation; 2. Automated In-Silico Material Design; and 3. MatSci-LLM Integrated Self-Driving Materials Laboratories.
