MatterChat: A Multi-Modal LLM for Material Science
Yingheng Tang, Wenbin Xu, Jie Cao, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W. Mahoney, Andy Nonaka, Zhi Yao
TL;DR
MatterChat tackles the challenge of integrating full-resolution atomic structures into language-based reasoning for materials science by introducing a bridging module that aligns a pretrained interatomic potential with a pretrained LLM. The architecture combines a Graph-based Material Processing Branch (CHGNet) with a Language Processing Branch (Mistral 7B) through a BLIP2-inspired Bridge Model, enabling effective multi-modal reasoning and text generation. Across nine material-property tasks, MatterChat surpasses open-source LLMs and physical ML baselines in both classification and quantitative predictions, while enabling advanced scientific reasoning and step-by-step synthesis guidance. The approach leverages Retrieval-Augmented Generation and structure-aware embeddings to improve robustness and applicability for accelerated material discovery, with potential impact on energy, electronics, and beyond.
Abstract
Understanding and predicting the properties of inorganic materials is crucial for accelerating advancements in materials science and driving applications in energy, electronics, and beyond. Integrating material structure data with language-based information through multi-modal large language models (LLMs) offers great potential to support these efforts by enhancing human-AI interaction. However, a key challenge lies in integrating atomic structures at full resolution into LLMs. In this work, we introduce MatterChat, a versatile structure-aware multi-modal LLM that unifies material structural data and textual inputs into a single cohesive model. MatterChat employs a bridging module to effectively align a pretrained machine learning interatomic potential with a pretrained LLM, reducing training costs and enhancing flexibility. Our results demonstrate that MatterChat significantly improves performance in material property prediction and human-AI interaction, surpassing general-purpose LLMs such as GPT-4. We also demonstrate its usefulness in applications such as more advanced scientific reasoning and step-by-step material synthesis.
