Table of Contents
Fetching ...

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi

TL;DR

The paper introduces the Open Materials 2024 (OMat24) dataset, a large-scale public DFT collection designed to advance AI-driven inorganic materials discovery. It presents EquiformerV2-based models trained on diverse non-equilibrium configurations, including three training strategies and multiple model sizes, with comprehensive evaluation on MatBench-Discovery. Key findings show that pre-training on OMat24 and subsequent fine-tuning achieve state-of-the-art F1 scores (~0.916) and formation-energy accuracy (~20 meV/atom), significantly outperforming compliant baselines. The work emphasizes open data, transfer learning potential, and future directions toward more accurate functionals and MD/MC applications to broaden AI-assisted materials science research.

Abstract

The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

TL;DR

The paper introduces the Open Materials 2024 (OMat24) dataset, a large-scale public DFT collection designed to advance AI-driven inorganic materials discovery. It presents EquiformerV2-based models trained on diverse non-equilibrium configurations, including three training strategies and multiple model sizes, with comprehensive evaluation on MatBench-Discovery. Key findings show that pre-training on OMat24 and subsequent fine-tuning achieve state-of-the-art F1 scores (~0.916) and formation-energy accuracy (~20 meV/atom), significantly outperforming compliant baselines. The work emphasizes open data, transfer learning potential, and future directions toward more accurate functionals and MD/MC applications to broaden AI-assisted materials science research.

Abstract

The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

Paper Structure

This paper contains 19 sections, 7 figures, 14 tables.

Figures (7)

  • Figure 1: Overview of the OMat24 dataset generation, application areas, and sampling strategies. Inset images are a random sample across the different sampling strategies.
  • Figure 2: (a) Energy per atom, forces norm and max absolute stress element distributions for MPtrj, Alexandria and OMat24 datasets. (b) Distribution of elements in the OMat24 dataset.
  • Figure 3: Formation energy taken directly from the WBM dataset wang_predicting_2021 and formation energy calculated from DFT calculations with OMat DFT settings. Outliers are primarily elements with updated psuedopotentials.
  • Figure 4: Histogram of number of atoms per structure per sub-dataset in OMat-24 dataset.
  • Figure 5: Energy, forces norm and maximum absolute stress densities for all sub-datasets in OMat-24.
  • ...and 2 more figures