Table of Contents
Fetching ...

Accelerating Two-Dimensional Materials Research via a Universal Interatomic Potential and Large Language Model Agent

Haidi Wang, Yufan Yao, Haonan Song, Xiaofeng Liu, Zhao Chen, Weiwei Chen, Weiduo Zhu, Zhongjun Li, Jinlong Yang

TL;DR

The paper presents a universal ML interatomic potential for 2D materials built on a M3GNet-inspired Graph Neural Network within the MatterSim framework, trained on a vast, 2D-focused dataset and augmented with bulk data. It demonstrates high predictive accuracy for energies, forces, and stresses, enabling reliable structural relaxations, lattice dynamics, and MD across 89 elements. An LLM-driven agent via MCP enhances usability, enabling natural-language-driven structure search, property prediction, and simulations for high-throughput materials discovery. Together, the universal IAP and intelligent agent offer a scalable platform for rapid 2D materials screening, design, and theoretical exploration, while outlining current limitations and avenues for future improvements in magnetism, temperature-range coverage, and electronic-property integration.

Abstract

Accurate interatomic potentials (IAPs) are essential for modeling the potential energy surfaces (PES) that govern atomic interactions in materials. However, most existing IAPs are developed for bulk materials and struggle to accurately and efficiently capture the diverse chemical environment of two-dimensional (2D) materials. This limitation poses a significant barrier to the large-scale design and simulation of emerging 2D systems. To address this challenge, we present a universal interatomic potential tailored for 2D materials. Our model is trained on a dataset comprising 327,062 structure-energy-force-stress mappings derived from 20,114 2D materials, spanning 89 chemical elements. The results show high predictive accuracy, with mean absolute errors of 6 meV/atom for energies, 80 meV/Åfor atomic forces, and 0.067 GPa for stress tensors. It demonstrates broad applicability across a range of atomistic tasks, including structural relaxation, lattice dynamics, molecular dynamics, material discovery, and so on. To further enhance usability and accessibility, we introduce an intelligent agent powered by a large language model (LLM), enabling natural language interaction for 2D materials property simulations. Our work provides not only a precise and universal IAP for 2D systems, but also an intelligent, user-friendly platform that enables high-throughput screening, property prediction, and theoretical exploration, thereby accelerating advances in 2D materials research.

Accelerating Two-Dimensional Materials Research via a Universal Interatomic Potential and Large Language Model Agent

TL;DR

The paper presents a universal ML interatomic potential for 2D materials built on a M3GNet-inspired Graph Neural Network within the MatterSim framework, trained on a vast, 2D-focused dataset and augmented with bulk data. It demonstrates high predictive accuracy for energies, forces, and stresses, enabling reliable structural relaxations, lattice dynamics, and MD across 89 elements. An LLM-driven agent via MCP enhances usability, enabling natural-language-driven structure search, property prediction, and simulations for high-throughput materials discovery. Together, the universal IAP and intelligent agent offer a scalable platform for rapid 2D materials screening, design, and theoretical exploration, while outlining current limitations and avenues for future improvements in magnetism, temperature-range coverage, and electronic-property integration.

Abstract

Accurate interatomic potentials (IAPs) are essential for modeling the potential energy surfaces (PES) that govern atomic interactions in materials. However, most existing IAPs are developed for bulk materials and struggle to accurately and efficiently capture the diverse chemical environment of two-dimensional (2D) materials. This limitation poses a significant barrier to the large-scale design and simulation of emerging 2D systems. To address this challenge, we present a universal interatomic potential tailored for 2D materials. Our model is trained on a dataset comprising 327,062 structure-energy-force-stress mappings derived from 20,114 2D materials, spanning 89 chemical elements. The results show high predictive accuracy, with mean absolute errors of 6 meV/atom for energies, 80 meV/Åfor atomic forces, and 0.067 GPa for stress tensors. It demonstrates broad applicability across a range of atomistic tasks, including structural relaxation, lattice dynamics, molecular dynamics, material discovery, and so on. To further enhance usability and accessibility, we introduce an intelligent agent powered by a large language model (LLM), enabling natural language interaction for 2D materials property simulations. Our work provides not only a precise and universal IAP for 2D systems, but also an intelligent, user-friendly platform that enables high-throughput screening, property prediction, and theoretical exploration, thereby accelerating advances in 2D materials research.

Paper Structure

This paper contains 27 sections, 10 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: Visualization of the dataset used for ML-IAP training. (a) Periodic table colored by the (log-scaled) frequency of elemental occurrences in the dataset. (b) Distribution of atomic force components versus energy per atom. (c) Distribution of stress components versus energy per atom.
  • Figure 2: The ML predicted values to the corresponding DFT values for (a) energy (eV/atom) via randomly selected 20000 training structures, (b) force in the x-direction (eV/Å), (c) force in the y-direction (eV/Å), and (d) force in the z-direction (eV/Å). The color bar represents the absolute error between the predicted and true values. The red dashed line indicates perfect agreement (y = x).
  • Figure 3: Comparison between DFT and ML predictions for lattice parameters, area errors, and energy errors. (a) The cumulative distribution function (CDF) of the relative area error $\left| \Delta S / S \right|$ between DFT and ML calculations. The shaded area between the curves indicates the error region, colored according to the error value, with annotations marking the 50%, 80%, and 95% cumulative percentages. (b) The CDF of the absolute energy error $\left| \hat{E} - E_{\text{gs}} \right|$ (eV/atom) between DFT and ML calculations. Similarly, the shaded region shows the error difference between DFT and ML, with horizontal lines marking the percentile values. (c) Scatter plot comparing the DFT and ML lattice parameters (a and b) combined, where each data point represents a pair (a, b) from both methods. The data points are colored according to the energy difference (eV/atom), and the diagonal dashed red line represents the perfect agreement line. (d) Scatter plot comparing the DFT and ML calculated areas, similarly colored according to the energy difference. The diagonal line again indicates ideal agreement. Color intensity in the CDF plots (a and b) indicates the magnitude of the relative error, while the scatter plots (c and d) use color to represent energy differences.
  • Figure 4: Comparison between ML predictions and DFT calculations for various mechanical and vibrational properties. (a) Layer modulus ($L_m$) predictions versus DFT-calculated values for materials from the C2DB database. (b) Shear modulus ($G_m$) predictions compared to DFT-calculated values, also from the C2DB database. For both (a) and (b), the solid blue line represents the ideal agreement (y = x), and the red line is a linear fit to the data points. (c) Distribution of equilibrium strain differences between ML-predicted and DFT-calculated equilibrium positions, based on 100 randomly selected structures. (d) Comparison of the exfoliation predictions from ML against DFT-calculated values for materials in the 2dMatpedia dataset, with the ideal (blue) line. (e) Comparison of the average phonon frequency predictions from ML against DFT-calculated values for materials in the MC2D dataset, with the fitted regression line (red) shown alongside the ideal (blue) line.
  • Figure 5: Arrhenius plots of the natural logarithm of the diffusion coefficient $\ln(D)$ versus inverse temperature $1000/T$ for Li+ diffusion in Mo60S120Li15 and Mo60S120Li20 systems. The dashed lines represent linear fits to AIMD and ML data, with corresponding activation energies ($E_a$) extracted from the slopes. The inset image illustrates the atomic structure of the Mo60S120Li15 system used in the simulations.
  • ...and 15 more figures