Hybrid Machine Learning for Enhanced Prediction of Diffusion Coefficients in Liquids

Jens Wagner; Zeno Romero; Kerstin Münnemann; Sebastian Schmitt; Thomas Specht; Hans Hasse; Fabian Jirasek

Hybrid Machine Learning for Enhanced Prediction of Diffusion Coefficients in Liquids

Jens Wagner, Zeno Romero, Kerstin Münnemann, Sebastian Schmitt, Thomas Specht, Hans Hasse, Fabian Jirasek

TL;DR

This work introduces a new method for predicting diffusion coefficients of molecular components at infinite dilution in pure liquid solvents by integrating the Stokes-Einstein (SE) equation with machine learning (ML).

Abstract

Diffusion coefficients are key thermophysical properties for modeling mass transport in liquids, but experimental data are scarce, making reliable prediction methods indispensable. In the present work, we introduce a new method for predicting diffusion coefficients of molecular components at infinite dilution in pure liquid solvents by integrating the Stokes-Einstein (SE) equation with machine learning (ML). Unlike previous ML approaches, the resulting hybrid Enhanced Stokes-Einstein (ESE) model provides strictly physically consistent predictions for diffusion coefficients as a function of temperature across a broad range of binary mixtures. Trained and validated using an extensive compilation of literature data for infinite-dilution diffusion coefficients in binary liquid systems, ESE achieves significantly higher prediction accuracies than the previous state-of-the-art model, SEGWE, while requiring only the SMILES strings encoding of the molecular formulae of the components of interest as additional inputs, which are always available. This simplicity makes ESE broadly applicable, e.g., for process design and optimization. The ESE model and its source code are fully disclosed and are directly accessible via an interactive web interface at https://ml-prop.mv.rptu.de/.

Hybrid Machine Learning for Enhanced Prediction of Diffusion Coefficients in Liquids

TL;DR

Abstract

Paper Structure (8 sections, 5 equations, 6 figures, 1 table)

This paper contains 8 sections, 5 equations, 6 figures, 1 table.

Introduction
Methods
Model Architecture
Molecular Descriptors
Experimental Database
Training and Evaluation
Results and Discussion
Conclusions

Figures (6)

Figure 1: Schematic overview of the hybrid Enhanced Stokes-Einstein (ESE) model for predicting liquid-phase diffusion coefficients at infinite dilution in binary mixtures $D^{\infty}_{ij}$. The prediction of the Stokes-Einstein (SE) equation, $D^{\infty,\mathrm{SE}}_{ij}$, is thereby corrected using a learned mixture-specific scaling factor, $b_{ij}$, computed by a positively-restricted neural network from the molecular descriptor vectors, $\mathbf{X}_i$ and $\mathbf{X}_j$ (cf. Tab. \ref{['tab_x']}), which are generated for solute $i$ and solvent $j$ from their SMILES strings using RDKit RDKit2024.
Figure 2: Molecular structures and corresponding molecular descriptor vectors $\mathbf{X}$ (cf. Table \ref{['tab_x']}) exemplary shown for ethanol, cyclohexanone, and hexafluorobenzene.
Figure 3: Boxplots of the absolute relative error (ARE, top) and squared relative error (SRE, bottom) of the predicted diffusion coefficients at infinite dilution $D^{\infty}_{ij}$ from SE, SEGWE, and ESE. The box width indicates the interquartile range, and the whisker length is 1.5 times the interquartile range. Outliers are not depicted for visual clarity.
Figure 4: Histograms (bars) and cumulative fractions (lines) showing the number of $D^{\infty}_{ij}$ predicted with a certain absolute relative error (ARE, top) or squared relative error (SRE, bottom) with SE, SEGWE, and ESE. The shown range for the ARE covers $>99~\%$ of the predictions for all models. The shown range of the SRE covers 96.53 % of the SE predictions, 98.51 % of the SEGWE predictions, and 99.31 % of the ESE predictions.
Figure 5: Mean absolute relative error (MARE) and mean squared relative error (MSRE) of the predicted $D^{\infty}_{ij}$ from SE, SEGWE, and ESE for nine distinct solute-solvent classes, cf. text. $N$ specifies the number of test data points in our database for each class.
...and 1 more figures

Hybrid Machine Learning for Enhanced Prediction of Diffusion Coefficients in Liquids

TL;DR

Abstract

Hybrid Machine Learning for Enhanced Prediction of Diffusion Coefficients in Liquids

Authors

TL;DR

Abstract

Table of Contents

Figures (6)