OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes
Félix Therrien, Jamal Abou Haibeh, Divya Sharma, Rhiannon Hendley, Leah Wairimu Mungai, Sun Sun, Alain Tchagang, Jiang Su, Samuel Huberman, Yoshua Bengio, Hongyu Guo, Alex Hernández-García, Homin Shin
TL;DR
OBELiX addresses the scarcity of open, experimentally measured ionic conductivity data with full crystal descriptions for lithium solid-state electrolytes. It curates ~599 materials with room-temperature conductivity and 321 crystallographic information files (CIFs), and implements leakage-free data splits to enable robust ML benchmarking. Evaluation across seven models shows simple baselines like random forest and multilayer perceptron can outperform complex graph neural networks in this small-data regime, underscoring the importance of data quality and representation of partial occupancy. By providing an open, well-documented dataset and a rigorous evaluation protocol, OBELiX aims to catalyze ML-driven discovery and validation of SSE materials and to support future MD/MLFF benchmarking in low-data contexts.
Abstract
Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a database of $\sim$600 synthesized solid electrolyte materials and their experimentally measured room temperature ionic conductivities gathered from literature and curated by domain experts. Each material is described by their measured composition, space group and lattice parameters. A full-crystal description in the form of a crystallographic information file (CIF) is provided for $\sim$320 structures for which atomic positions were available. We discuss various statistics and features of the dataset and provide training and testing splits carefully designed to avoid data leakage. Finally, we benchmark seven existing ML models on the task of predicting ionic conductivity and discuss their performance. The goal of this work is to facilitate the use of machine learning for solid-state electrolyte materials discovery.
