Table of Contents
Fetching ...

Machine Learning for Electrode Materials: Property Prediction via Composition

Hao Wu, Cameron Hargreaves, Arpit Mishra, Gian-Marco Rignanese

TL;DR

This work benchmarks three leading Machine Learning frameworks-MODNet, CrabNet, and a random forest model based on Magpie feature-for predicting properties of battery electrode materials using the Materials Project Battery Explorer dataset, and suggests that ML models are highly effective for early-stage compositional screening in the battery industry.

Abstract

In this work, we benchmark three leading Machine Learning (ML) frameworks-MODNet, CrabNet, and a random forest model based on Magpie feature-for predicting properties of battery electrode materials using the Materials Project Battery Explorer dataset. We evaluate these models based on predictive accuracy, visualize numerical features using two-dimensional embeddings, and quantify performance using standard metrics. Our results demonstrate that CrabNet consistently outperforms the other models across all tests. To validate these findings, we employ robust statistical methods: bootstrap resampling and two cross-validation (CV) strategies (leave one cluster out and stratified 5-fold CV), comparing each model against a control baseline. In addition, we apply unsupervised clustering on MODNet-derived features using t-SNE and DBSCAN, revealing coherent material groupings without prior labels. This analysis confirms the robustness of the evaluated models and underscores the potential of ML-driven approaches for accelerating the electrode materials discovery. However, our study also identifies practical limitations and quantifies challenges associated with integrating ML models into materials science workflows. Despite these constraints, our findings suggest that ML models are highly effective for early-stage compositional screening in the battery industry. This work provides a foundation for future research on ML applications in materials discovery.

Machine Learning for Electrode Materials: Property Prediction via Composition

TL;DR

This work benchmarks three leading Machine Learning frameworks-MODNet, CrabNet, and a random forest model based on Magpie feature-for predicting properties of battery electrode materials using the Materials Project Battery Explorer dataset, and suggests that ML models are highly effective for early-stage compositional screening in the battery industry.

Abstract

In this work, we benchmark three leading Machine Learning (ML) frameworks-MODNet, CrabNet, and a random forest model based on Magpie feature-for predicting properties of battery electrode materials using the Materials Project Battery Explorer dataset. We evaluate these models based on predictive accuracy, visualize numerical features using two-dimensional embeddings, and quantify performance using standard metrics. Our results demonstrate that CrabNet consistently outperforms the other models across all tests. To validate these findings, we employ robust statistical methods: bootstrap resampling and two cross-validation (CV) strategies (leave one cluster out and stratified 5-fold CV), comparing each model against a control baseline. In addition, we apply unsupervised clustering on MODNet-derived features using t-SNE and DBSCAN, revealing coherent material groupings without prior labels. This analysis confirms the robustness of the evaluated models and underscores the potential of ML-driven approaches for accelerating the electrode materials discovery. However, our study also identifies practical limitations and quantifies challenges associated with integrating ML models into materials science workflows. Despite these constraints, our findings suggest that ML models are highly effective for early-stage compositional screening in the battery industry. This work provides a foundation for future research on ML applications in materials discovery.
Paper Structure (1 section, 2 equations, 12 figures, 4 tables)

This paper contains 1 section, 2 equations, 12 figures, 4 tables.

Table of Contents

  1. Appendix

Figures (12)

  • Figure 1: Distributions of the target properties in the dataset. Solid blue and dashed red lines indicate the median (M) and mean ($\mu$) values. Dashed green and purple lines denote empirically observed $\mu$ ± $2\sigma$ (inner green band) and $\mu$ ± $3\sigma$ (outer purple band), with the percentage of the dataset that falls within each interval overlaid. Logarithmic scaling is applied to the count on the y-axis.
  • Figure 2: Distribution of working ions in the electrode materials dataset.
  • Figure 3: Three t-SNE embeddings of the materials in the dataset, using input features from (a) MODNet, (b) CrabNet (with mat2vec node embeddings) and (c) Magpie. The points have been colored according to the working ion in the cathode.
  • Figure 4: 2D map of the t-SNE embeddings of the materials using input features from MODNet (a, b) and CrabNet (mat2vec) (c, d). The points have been colored according to gravimetric (a, c) and volumetric (b, d) capacity values according to the colorbars.
  • Figure 5: 2D map of t-SNE embeddings of the materials using input features from MODNet. The points have been colored based on DBSCAN clustering. A total of 14 clusters are identified. The representative material from each cluster, as selected by ElMD mean representative, is indicated together with the cluster number.
  • ...and 7 more figures