Improving Performance Prediction of Electrolyte Formulations with Transformer-based Molecular Representation Model
Indra Priyadarsini, Vidushi Sharma, Seiji Takeda, Akihiro Kishimoto, Lisa Hamada, Hajime Shinohara
TL;DR
The study tackles the challenge of predicting electrolyte performance in multi-component formulations by learning a robust, composition-aware molecular representation. It combines a SELFIES-based BART pretraining regime with a simple, fixed-dimension formulation vector $SA = \sum_{i=1}^n c_i \boldsymbol{r_i}$, where $\boldsymbol{r_i} \in \mathbb{R}^d$, enabling effective downstream property prediction. Empirically, the approach achieves state-of-the-art RMSE on Coulombic efficiency ($\text{RMSE}=0.148$) and Li|I full-cell specific capacity ($\text{RMSE}=20.001$ mAh/g), outperforming competing methods such as F-GCN variants and MolFormer-based baselines. This indicates the method’s potential to accelerate electrolyte formulation discovery by providing a scalable, data-efficient representation that captures constituent interactions in high-dimensional chemical spaces.
Abstract
Development of efficient and high-performing electrolytes is crucial for advancing energy storage technologies, particularly in batteries. Predicting the performance of battery electrolytes rely on complex interactions between the individual constituents. Consequently, a strategy that adeptly captures these relationships and forms a robust representation of the formulation is essential for integrating with machine learning models to predict properties accurately. In this paper, we introduce a novel approach leveraging a transformer-based molecular representation model to effectively and efficiently capture the representation of electrolyte formulations. The performance of the proposed approach is evaluated on two battery property prediction tasks and the results show superior performance compared to the state-of-the-art methods.
