Table of Contents
Fetching ...

OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes

Félix Therrien, Jamal Abou Haibeh, Divya Sharma, Rhiannon Hendley, Leah Wairimu Mungai, Sun Sun, Alain Tchagang, Jiang Su, Samuel Huberman, Yoshua Bengio, Hongyu Guo, Alex Hernández-García, Homin Shin

TL;DR

OBELiX addresses the scarcity of open, experimentally measured ionic conductivity data with full crystal descriptions for lithium solid-state electrolytes. It curates ~599 materials with room-temperature conductivity and 321 crystallographic information files (CIFs), and implements leakage-free data splits to enable robust ML benchmarking. Evaluation across seven models shows simple baselines like random forest and multilayer perceptron can outperform complex graph neural networks in this small-data regime, underscoring the importance of data quality and representation of partial occupancy. By providing an open, well-documented dataset and a rigorous evaluation protocol, OBELiX aims to catalyze ML-driven discovery and validation of SSE materials and to support future MD/MLFF benchmarking in low-data contexts.

Abstract

Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a database of $\sim$600 synthesized solid electrolyte materials and their experimentally measured room temperature ionic conductivities gathered from literature and curated by domain experts. Each material is described by their measured composition, space group and lattice parameters. A full-crystal description in the form of a crystallographic information file (CIF) is provided for $\sim$320 structures for which atomic positions were available. We discuss various statistics and features of the dataset and provide training and testing splits carefully designed to avoid data leakage. Finally, we benchmark seven existing ML models on the task of predicting ionic conductivity and discuss their performance. The goal of this work is to facilitate the use of machine learning for solid-state electrolyte materials discovery.

OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes

TL;DR

OBELiX addresses the scarcity of open, experimentally measured ionic conductivity data with full crystal descriptions for lithium solid-state electrolytes. It curates ~599 materials with room-temperature conductivity and 321 crystallographic information files (CIFs), and implements leakage-free data splits to enable robust ML benchmarking. Evaluation across seven models shows simple baselines like random forest and multilayer perceptron can outperform complex graph neural networks in this small-data regime, underscoring the importance of data quality and representation of partial occupancy. By providing an open, well-documented dataset and a rigorous evaluation protocol, OBELiX aims to catalyze ML-driven discovery and validation of SSE materials and to support future MD/MLFF benchmarking in low-data contexts.

Abstract

Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a database of 600 synthesized solid electrolyte materials and their experimentally measured room temperature ionic conductivities gathered from literature and curated by domain experts. Each material is described by their measured composition, space group and lattice parameters. A full-crystal description in the form of a crystallographic information file (CIF) is provided for 320 structures for which atomic positions were available. We discuss various statistics and features of the dataset and provide training and testing splits carefully designed to avoid data leakage. Finally, we benchmark seven existing ML models on the task of predicting ionic conductivity and discuss their performance. The goal of this work is to facilitate the use of machine learning for solid-state electrolyte materials discovery.

Paper Structure

This paper contains 17 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Examples of solid state electrolyte materials with partial occupancies.
  • Figure 2: a) Distributions of ionic-conductivity values for the training and testing sets along with proportions of crystal families and space groups. Only space groups that represent more than 1% of the sets are labeled. b) Venn diagram showing how OBELiX entries are shared across the ICSD, Laskowski and LiIon datasets. There are 2 OBELiX entries that are not part of any of the three datasets. c) Proportion of entries that contain each element in the periodic table. Elements that are not present in the dataset are shaded. Generated with pymatviz pymatviz.
  • Figure 3: Ionic conductivity of entries in the dataset that have the same composition and space group. The color shows the largest relative difference between lattice parameters within a set of entries with same space group and composition. The inset shows the distribution of differences with the mean ionic conductivity of the sets in log scale. It is scaled proportionally to the rest of the plot.
  • Figure 4: Benchmarking of various ML models. The same data is tabulated in Table \ref{['table:mae_models']}. Simpler models outperform geometric GNNs.
  • Figure S1: Parity plots for benchmarked models. a) Random Forest b) Multilayer perceptron c) PaiNN d) SchNet e) M3GNet f) SO3Net g) CGCNN h) PaiNN with pretraining i) SchNet with pretraining j) M3GNet with pretraining k) CGCNN with pretraining l) CGCNN with disorder (partial occupancy) m) SO3Net with disorder (partial occupency)