Table of Contents
Fetching ...

Prediction of Diffusion Coefficients in Mixtures with Tensor Completion

Zeno Romero, Kerstin Münnemann, Hans Hasse, Fabian Jirasek

Abstract

Predicting diffusion coefficients in mixtures is crucial for many applications, as experimental data remain scarce, and machine learning (ML) offers promising alternatives to established semi-empirical models. Among ML models, matrix completion methods (MCMs) have proven effective in predicting thermophysical properties, including diffusion coefficients in binary mixtures. However, MCMs are restricted to single-temperature predictions, and their accuracy depends strongly on the availability of high-quality experimental data for each temperature of interest. In this work, we address this challenge by presenting a hybrid tensor completion method (TCM) for predicting temperature-dependent diffusion coefficients at infinite dilution in binary mixtures. The TCM employs a Tucker decomposition and is jointly trained on experimental data for diffusion coefficients at infinite dilution in binary systems at 298 K, 313 K, and 333 K. Predictions from the semi-empirical SEGWE model serve as prior knowledge within a Bayesian training framework. The TCM then extrapolates linearly to any temperature between 268 K and 378 K, achieving markedly improved prediction accuracy compared to established models across all studied temperatures. To further enhance predictive performance, the experimental database was expanded using active learning (AL) strategies for targeted acquisition of new diffusion data by pulsed-field gradient (PFG) NMR measurements. Diffusion coefficients at infinite dilution in 19 solute + solvent systems were measured at 298 K, 313 K, and 333 K. Incorporating these results yields a substantial improvement in the TCM's predictive accuracy. These findings highlight the potential of combining data-efficient ML methods with adaptive experimentation to advance predictive modeling of transport properties.

Prediction of Diffusion Coefficients in Mixtures with Tensor Completion

Abstract

Predicting diffusion coefficients in mixtures is crucial for many applications, as experimental data remain scarce, and machine learning (ML) offers promising alternatives to established semi-empirical models. Among ML models, matrix completion methods (MCMs) have proven effective in predicting thermophysical properties, including diffusion coefficients in binary mixtures. However, MCMs are restricted to single-temperature predictions, and their accuracy depends strongly on the availability of high-quality experimental data for each temperature of interest. In this work, we address this challenge by presenting a hybrid tensor completion method (TCM) for predicting temperature-dependent diffusion coefficients at infinite dilution in binary mixtures. The TCM employs a Tucker decomposition and is jointly trained on experimental data for diffusion coefficients at infinite dilution in binary systems at 298 K, 313 K, and 333 K. Predictions from the semi-empirical SEGWE model serve as prior knowledge within a Bayesian training framework. The TCM then extrapolates linearly to any temperature between 268 K and 378 K, achieving markedly improved prediction accuracy compared to established models across all studied temperatures. To further enhance predictive performance, the experimental database was expanded using active learning (AL) strategies for targeted acquisition of new diffusion data by pulsed-field gradient (PFG) NMR measurements. Diffusion coefficients at infinite dilution in 19 solute + solvent systems were measured at 298 K, 313 K, and 333 K. Incorporating these results yields a substantial improvement in the TCM's predictive accuracy. These findings highlight the potential of combining data-efficient ML methods with adaptive experimentation to advance predictive modeling of transport properties.
Paper Structure (15 sections, 7 equations, 7 figures, 3 tables)

This paper contains 15 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Experimental data for liquid-phase diffusion coefficients $D_{ij}^{\infty}$ of solutes $i$ at infinite dilution in solvents $j$ at temperatures 298 K, 313 K, and 333 K in the database used as the starting point in the present work. Numbers identify solutes and solvents, cf. Tables S1 and S2 in the Supporting Information. Solutes are ordered with respect to their molar mass (bottom: low; top: high), and solvents are ordered with respect to their viscosity (left: low; right: high). The color code indicates the value of $D_{ij}^{\infty\text{,exp}}$, and black cells denote missing data.
  • Figure 2: Experimental data for liquid-phase diffusion coefficients $D_{ij}^{\infty\text{,exp}}$ of solutes $i$ at infinite dilution in solvents $j$ at temperatures 298 K, 313 K, and 333 K in the database used as starting point for the AL study. Numbers identify solutes and solvents, cf. Tables S3 and S4 in the Supporting Information. Solutes are ordered with respect to their molar mass (bottom: low; top: high), and solvents are ordered with respect to their viscosity (left: low; right: high). The color code indicates the value of $D_{ij}^{\infty}$, and black cells denote missing data.
  • Figure 3: Schematic representation of the hybrid TCM for predicting temperature-dependent $D_{ij}^\infty$ developed in this work. The TCM incorporates prior information from the SEGWE model Evans2018 and uses the Tucker decomposition for tensor factorization.
  • Figure 4: Active learning workflow for the targeted improvement of the TCM developed in this work.
  • Figure 5: Boxplot of the ARE$_{ij}$ of the predicted $D_{ij}^\infty$ with SEGWE Evans2018, MCM Grossmann2022, and the developed TCM. MCM and TCM results were obtained using leave-one-out analysis, the SEGWE model was used as proposed by the original authorsEvans2018. Boxes represent interquartile ranges (IQR) and whiskers represent 1.5 IQR.
  • ...and 2 more figures