Table of Contents
Fetching ...

Learning Magnetic Order Classification from Large-Scale Materials Databases

Ahmed E. Fahmy

Abstract

The reliable identification of magnetic ground states remains a major challenge in high-throughput materials databases, where density functional theory (DFT) workflows often converge to ferromagnetic (FM) solutions. Here, we partially address this challenge by developing machine learning classifiers trained on experimentally validated MAGNDATA magnetic materials leveraging a limited number of simple compositional, structural, and electronic descriptors sourced from the Materials Project database. Our propagation vector classifiers achieve accuracies above 92%, outperforming recent studies in reliably distinguishing zero from nonzero propagation vector structures, and exposing a systematic ferromagnetic bias inherent to the Materials Project database for more than 7,843 materials. In parallel, LightGBM and XGBoost models trained directly on the Materials Project labels achieve accuracies of 84-86% (with macro F1 average scores of 63-66%), which proves useful for large-scale screening for magnetic classes, if refined by MAGNDATA-trained classifiers. These results underscore the role of machine learning techniques as corrective and exploratory tools, enabling more trustworthy databases and accelerating progress toward the identification of materials with various properties.

Learning Magnetic Order Classification from Large-Scale Materials Databases

Abstract

The reliable identification of magnetic ground states remains a major challenge in high-throughput materials databases, where density functional theory (DFT) workflows often converge to ferromagnetic (FM) solutions. Here, we partially address this challenge by developing machine learning classifiers trained on experimentally validated MAGNDATA magnetic materials leveraging a limited number of simple compositional, structural, and electronic descriptors sourced from the Materials Project database. Our propagation vector classifiers achieve accuracies above 92%, outperforming recent studies in reliably distinguishing zero from nonzero propagation vector structures, and exposing a systematic ferromagnetic bias inherent to the Materials Project database for more than 7,843 materials. In parallel, LightGBM and XGBoost models trained directly on the Materials Project labels achieve accuracies of 84-86% (with macro F1 average scores of 63-66%), which proves useful for large-scale screening for magnetic classes, if refined by MAGNDATA-trained classifiers. These results underscore the role of machine learning techniques as corrective and exploratory tools, enabling more trustworthy databases and accelerating progress toward the identification of materials with various properties.

Paper Structure

This paper contains 4 sections, 1 equation, 10 figures, 2 tables.

Figures (10)

  • Figure 1: (a) Distribution of magnetic classes in the Materials Project database without filtering. (b) Distribution of magnetic classes restricted to Materials Project compounds containing at least one magnetic element.
  • Figure 2: Distributions (log-scale counts) of selected features in the refined Materials Project dataset, restricted to compounds containing at least one magnetic element and resolved by magnetic class: (a) electronic band gap, (b) valence band maximum (VBM), (c) mass density, and (d) atomic density. These descriptors highlight systematic differences between ferromagnetic (FM), antiferromagnetic (AFM), ferrimagnetic (FiM), and nonmagnetic (NM) compounds.
  • Figure 3: Distributions of magnetic classes in the MAGNDATA dataset: (a) magnetic order labels, including ferromagnetic (FM), ferrimagnetic (FiM), antiferromagnetic (AFM), and complex orders; (b) propagation vector labels, distinguishing between zero and nonzero $k$-vectors. These distributions highlight the prevalence of AFM and complex orders in experimentally determined magnetic structures, as well as the near balance between commensurate and incommensurate propagation vectors.
  • Figure 4: Performance of different classifiers on the Materials Project validation set, measured by accuracy (blue) and macro-$F_1$ score (green). Simple baselines such as the dummy and decision tree classifiers perform significantly worse than ensemble methods, with Random Forest, LightGBM, and XGBoost achieving the highest accuracies (0.82–0.85) and balanced F1 scores (0.62–0.66).
  • Figure 5: Confusion matrices for the LightGBM classifier applied to the Materials Project dataset (restricted to compounds containing at least one magnetic element), shown under two labeling schemes: (a) four-class classification (FM, AFM, FiM, NM) and (b) binary classification (magnetic vs. nonmagnetic).
  • ...and 5 more figures