Table of Contents
Fetching ...

A Unified Predictive and Generative Solution for Liquid Electrolyte Formulation

Zhenze Yang, Yifan Wu, Xu Han, Ziqing Zhang, Haoen Lai, Zhenliang Mu, Tianze Zheng, Siyuan Liu, Zhichen Pu, Zhi Wang, Zhiao Yu, Sheng Gong, Wen Yan

Abstract

Liquid electrolytes are critical components of next-generation energy storage systems, enabling fast ion transport, minimizing interfacial resistance, and ensuring electrochemical stability for long-term battery performance. However, measuring electrolyte properties and designing formulations remain experimentally and computationally expensive. In this work, we present a unified framework for designing liquid electrolyte formulation, integrating a forward predictive model with an inverse generative approach. Leveraging both computational and experimental data collected from literature and extensive molecular simulations, we train a predictive model capable of accurately estimating electrolyte properties from ionic conductivity to solvation structure. Our physics-informed architecture preserves permutation invariance and incorporates empirical dependencies on temperature and salt concentration, making it broadly applicable to property prediction tasks across molecular mixtures. Furthermore, we introduce -- to the best of our knowledge -- the first generative machine learning framework for molecular mixture design, demonstrated on electrolyte systems. This framework supports multi-condition-constrained generation, addressing the inherently multi-objective nature of materials design. As a proof of concept, we experimentally identified three liquid electrolytes with both high ionic conductivity and anion-concentrated solvation structure. This unified framework advances data-driven electrolyte design and can be readily extended to other complex chemical systems beyond electrolytes.

A Unified Predictive and Generative Solution for Liquid Electrolyte Formulation

Abstract

Liquid electrolytes are critical components of next-generation energy storage systems, enabling fast ion transport, minimizing interfacial resistance, and ensuring electrochemical stability for long-term battery performance. However, measuring electrolyte properties and designing formulations remain experimentally and computationally expensive. In this work, we present a unified framework for designing liquid electrolyte formulation, integrating a forward predictive model with an inverse generative approach. Leveraging both computational and experimental data collected from literature and extensive molecular simulations, we train a predictive model capable of accurately estimating electrolyte properties from ionic conductivity to solvation structure. Our physics-informed architecture preserves permutation invariance and incorporates empirical dependencies on temperature and salt concentration, making it broadly applicable to property prediction tasks across molecular mixtures. Furthermore, we introduce -- to the best of our knowledge -- the first generative machine learning framework for molecular mixture design, demonstrated on electrolyte systems. This framework supports multi-condition-constrained generation, addressing the inherently multi-objective nature of materials design. As a proof of concept, we experimentally identified three liquid electrolytes with both high ionic conductivity and anion-concentrated solvation structure. This unified framework advances data-driven electrolyte design and can be readily extended to other complex chemical systems beyond electrolytes.

Paper Structure

This paper contains 22 sections, 31 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: A predictive and generative electrolyte design workflow reported in this work.a Forward (prediction) and inverse (generation) processes of electrolyte formulation are designed as three-stage workflows, using molecular embeddings (representations of individual component molecules in an electrolyte formulation) and electrolyte embeddings (permutation-invariant representations of entire electrolyte formulation). b Three stages of predictive model for conductivity and anion ratio predictions: (1) A GNN model is trained on single-molecule dataset on multi-property prediction, generating universal molecular embeddings. (2) MD data of around 100,000 different electrolyte formulations are utilized to further construct an informative electrolyte embedding from molecular embeddings. (3) An empirical relation is integrated into the model architecture and fine-tuned with 10,000+ experimental literature conductivity data points. c Three stages of generative model given property conditions: (1) A conditional diffusion model generates electrolyte embeddings based on specified properties. (2) The generated electrolyte embeddings are converted back to molecular embeddings with a decoder. (3) Finally, molecular embeddings are matched with our chemical database to obtain the electrolyte formulation.
  • Figure 2: Prediction performancea Comparison of model performance across various molecular properties during molecular pretraining. Our GNN model utilizes atom and bond features as input and is based on an EGT model (red bar, "Atom/bond feature + EGT"). Other two models are considered as baseline results (blue bars, "Morgan fingerprint + NN" and "Atom/bond feature + GAT"). b Anion ratio and ionic conductivity obtained from MD simulations using an OPLS force field. The density of data below 0.005 is omitted in the figure. Here, conductivity calculated using Mistry's method is used given that it aligns generally better with experiments (Fig. \ref{['si_fig:cond_compare']}). c Predictions of our model versus ground truth after experimental fine-tuning for both anion ratio and conductivity. d Example predictions of temperature and concentration dependence of conductivity across various electrolyte systems with empirical equation. $T_0$ is a learnable temperature parameters in the empirical equation, which is generally related to glass transition temperature of electrolytes.
  • Figure 3: Generation performancesa Extrapolation of target properties using generative model. To extrapolate conductivity, the anion ratio is fixed at the dataset’s mean value. Similarly, to extrapolate the anion ratio, conductivity is set to its mean. b Example distribution of generated electrolyte formulations given target conductivity (10.0 mS/cm) and anion ratio (0.3). More results of different conditions can be found in Fig. \ref{['si_fig:cond_generation']}. c One example of generated electrolyte formulation from panel Fig. \ref{['fig:generation']}b. More examples are listed in Fig. \ref{['si_fig:example_formulation_5.0']} - Fig. \ref{['si_fig:example_formulation_30.0']}. d Evaluation of generation using three metrics including MAPE and two diversity scores. e Schematic of classifier-guided conditional diffusion to ensure that the generated formulation satisfies the base formulation constraints. f Performance comparison between conditional generation and classifier-guided generation in satisfying the base formulation constraints.
  • Figure 4: Experimental validationa Conductivities of generated formulations measured from experiments (target conductivity in conditional generation = 15 or 20 mS/cm). b Raman spectrum of FSI$^{-}$ comparing generated formulations with single-solvent systems. The detailed molar ratios of each solvent component in the formulations are omitted from the figure and can be found in Table \ref{['si_tab:exp_formulations']}.
  • Figure S1: Detailed model architectures within the workflow reported in this work.a Overall workflow of both predictive and generative process. Solvents and Li salts are considered separately in the workflow. b Module architecture of GNN model which takes SMILES of a single molecule as input and generates a universal molecular embedding by multi-task learning on 11 molecular properties. c. A self-attention-based aggregation block to merge multiple molecular embeddings into a mixture embedding and ensure permutation invariance. "$\times$" is row-wise multiplication, "$\odot$" stands for matrix multiplication and "$\langle, \rangle$" represents inner product. d The empirical relation block for conductivity and anion ratio prediction. For conductivity, there are six empirical learnable parameters (except viscosity) based on the electrolyte embedding, while anion ratio is predicted directly from the electrolyte embedding concatenated with temperature and concentration using a readout MLP layer. Activation layers are omitted from the schematic. e The decoder module which recovers molecular embeddings from mixture embeddings for both solvents and salts, which is further matched to molecules in our electrolyte molecule database.
  • ...and 12 more figures