Table of Contents
Fetching ...

Improving Performance Prediction of Electrolyte Formulations with Transformer-based Molecular Representation Model

Indra Priyadarsini, Vidushi Sharma, Seiji Takeda, Akihiro Kishimoto, Lisa Hamada, Hajime Shinohara

TL;DR

The study tackles the challenge of predicting electrolyte performance in multi-component formulations by learning a robust, composition-aware molecular representation. It combines a SELFIES-based BART pretraining regime with a simple, fixed-dimension formulation vector $SA = \sum_{i=1}^n c_i \boldsymbol{r_i}$, where $\boldsymbol{r_i} \in \mathbb{R}^d$, enabling effective downstream property prediction. Empirically, the approach achieves state-of-the-art RMSE on Coulombic efficiency ($\text{RMSE}=0.148$) and Li|I full-cell specific capacity ($\text{RMSE}=20.001$ mAh/g), outperforming competing methods such as F-GCN variants and MolFormer-based baselines. This indicates the method’s potential to accelerate electrolyte formulation discovery by providing a scalable, data-efficient representation that captures constituent interactions in high-dimensional chemical spaces.

Abstract

Development of efficient and high-performing electrolytes is crucial for advancing energy storage technologies, particularly in batteries. Predicting the performance of battery electrolytes rely on complex interactions between the individual constituents. Consequently, a strategy that adeptly captures these relationships and forms a robust representation of the formulation is essential for integrating with machine learning models to predict properties accurately. In this paper, we introduce a novel approach leveraging a transformer-based molecular representation model to effectively and efficiently capture the representation of electrolyte formulations. The performance of the proposed approach is evaluated on two battery property prediction tasks and the results show superior performance compared to the state-of-the-art methods.

Improving Performance Prediction of Electrolyte Formulations with Transformer-based Molecular Representation Model

TL;DR

The study tackles the challenge of predicting electrolyte performance in multi-component formulations by learning a robust, composition-aware molecular representation. It combines a SELFIES-based BART pretraining regime with a simple, fixed-dimension formulation vector , where , enabling effective downstream property prediction. Empirically, the approach achieves state-of-the-art RMSE on Coulombic efficiency () and Li|I full-cell specific capacity ( mAh/g), outperforming competing methods such as F-GCN variants and MolFormer-based baselines. This indicates the method’s potential to accelerate electrolyte formulation discovery by providing a scalable, data-efficient representation that captures constituent interactions in high-dimensional chemical spaces.

Abstract

Development of efficient and high-performing electrolytes is crucial for advancing energy storage technologies, particularly in batteries. Predicting the performance of battery electrolytes rely on complex interactions between the individual constituents. Consequently, a strategy that adeptly captures these relationships and forms a robust representation of the formulation is essential for integrating with machine learning models to predict properties accurately. In this paper, we introduce a novel approach leveraging a transformer-based molecular representation model to effectively and efficiently capture the representation of electrolyte formulations. The performance of the proposed approach is evaluated on two battery property prediction tasks and the results show superior performance compared to the state-of-the-art methods.
Paper Structure (10 sections, 1 equation, 4 figures, 2 tables)

This paper contains 10 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Pre-training model architecture
  • Figure 2: Illustration of the general schematic of the proposed method. (a) shows the general format of the electrolyte formulation dataset. (b) describes the procedure to construct the feature vector for an electrolyte formulation. (c) shows the fine-tuning model trained using the feature vector for a given prediction task.
  • Figure 3: Parity plots showing predicted LCE values as scatterplots with respect to the actual values
  • Figure 4: Parity plots showing predicted battery capacities (in mAh/g) as scatterplots with respect to the actual values