Table of Contents
Fetching ...

Advancing Thermodynamic Group-Contribution Methods by Machine Learning: UNIFAC 2.0

Nicolas Hayer, Thorsten Wendel, Stephan Mandt, Hans Hasse, Fabian Jirasek

TL;DR

This work addresses the incomplete parameterization of thermodynamic GC methods, notably UNIFAC, by embedding a matrix-completion ML module to predict all pair-interaction parameters $a_{mn}$. The resulting UNIFAC 2.0 is trained end-to-end on $\ln\gamma_i$ from binary VLE data, yielding a gap-free parameter table and significantly improved accuracy (nearly halving the MSE) while greatly expanding applicability to thousands of mixtures in the Dortmund DDB. The method demonstrates robust extrapolation to unseen components and unseen pair-interaction parameters, and remains easily updatable and adaptable for tailored applications. Practically, UNIFAC 2.0 can be implemented with a simple parameter-table replacement in existing simulators, enabling more reliable and extensive thermodynamic predictions for process design and optimization.

Abstract

Accurate prediction of thermodynamic properties is pivotal in chemical engineering for optimizing process efficiency and sustainability. Physical group-contribution (GC) methods are widely employed for this purpose but suffer from historically grown, incomplete parameterizations, limiting their applicability and accuracy. In this work, we overcome these limitations by combining GC with matrix completion methods (MCM) from machine learning. We use the novel approach to predict a complete set of pair-interaction parameters for the most successful GC method: UNIFAC, the workhorse for predicting activity coefficients in liquid mixtures. The resulting new method, UNIFAC 2.0, is trained and validated on more than 224,000 experimental data points, showcasing significantly enhanced prediction accuracy (e.g., nearly halving the mean squared error) and increased scope by eliminating gaps in the original model's parameter table. Moreover, the generic nature of the approach facilitates updating the method with new data or tailoring it to specific applications.

Advancing Thermodynamic Group-Contribution Methods by Machine Learning: UNIFAC 2.0

TL;DR

This work addresses the incomplete parameterization of thermodynamic GC methods, notably UNIFAC, by embedding a matrix-completion ML module to predict all pair-interaction parameters . The resulting UNIFAC 2.0 is trained end-to-end on from binary VLE data, yielding a gap-free parameter table and significantly improved accuracy (nearly halving the MSE) while greatly expanding applicability to thousands of mixtures in the Dortmund DDB. The method demonstrates robust extrapolation to unseen components and unseen pair-interaction parameters, and remains easily updatable and adaptable for tailored applications. Practically, UNIFAC 2.0 can be implemented with a simple parameter-table replacement in existing simulators, enabling more reliable and extensive thermodynamic predictions for process design and optimization.

Abstract

Accurate prediction of thermodynamic properties is pivotal in chemical engineering for optimizing process efficiency and sustainability. Physical group-contribution (GC) methods are widely employed for this purpose but suffer from historically grown, incomplete parameterizations, limiting their applicability and accuracy. In this work, we overcome these limitations by combining GC with matrix completion methods (MCM) from machine learning. We use the novel approach to predict a complete set of pair-interaction parameters for the most successful GC method: UNIFAC, the workhorse for predicting activity coefficients in liquid mixtures. The resulting new method, UNIFAC 2.0, is trained and validated on more than 224,000 experimental data points, showcasing significantly enhanced prediction accuracy (e.g., nearly halving the mean squared error) and increased scope by eliminating gaps in the original model's parameter table. Moreover, the generic nature of the approach facilitates updating the method with new data or tailoring it to specific applications.
Paper Structure (9 sections, 4 equations, 6 figures)

This paper contains 9 sections, 4 equations, 6 figures.

Figures (6)

  • Figure 1: Comparison of UNIFAC 1.0 and UNIFAC 2.0. UNIFAC 1.0 relies on sequential parameter fitting guided by intuition, whereas UNIFAC 2.0 integrates a matrix completion method (MCM) for predicting pair-interaction parameters into the UNIFAC framework. UNIFAC 2.0 is trained end-to-end on experimental logarithmic activity coefficients ($\ln\gamma_i$) derived from binary vapor-liquid equilibrium (VLE) data. After training, the completed pair-interaction parameter matrix facilitates accurate predictions of phase diagrams for a wide range of binary or multi-component mixtures.
  • Figure 2: Comparison of results for $\ln\gamma_i$ with UNIFAC 1.0 and UNIFAC 2.0 for different data sets: the "UNIFAC 1.0 horizon" comprises 210,767 data points for 15,758 binary mixtures, while an additional 13,795 experimental data points for 2,957 binary mixtures can only be predicted with UNIFAC 2.0 ("UNIFAC 2.0 only"). (a) Mean absolute error (MAE) and mean squared error (MSE) of the model predictions. Error bars denote standard errors of the means. (b) Histogram of the number of binary mixtures $N_\text{mix}$ that can be predicted with an MAE in a certain interval. The MAE range shown in panel (b) comprises 98.8% (UNIFAC 1.0) and 99.4% (UNIFAC 2.0) of all mixtures.
  • Figure 3: Prediction of isothermal vapor–liquid phase diagrams for binary mixtures with UNIFAC 2.0 (lines) and comparison to experimental data from the DDB (symbols). Blue: bubble point curves. Orange: dew point curves.
  • Figure 4: Prediction of isothermal vapor-liquid phase diagrams for ternary mixtures with UNIFAC 2.0 (pred) and comparison to experimental data (exp) from the DDB. The temperature and the composition of the liquid phase were specified, and the composition of the corresponding vapor phase in equilibrium was predicted. Solid lines are experimental conodes, dashed lines are predicted conodes.
  • Figure 5: Mean absolute error (MAE) and mean squared error (MSE) of the predicted $\ln\gamma_i$ of mixtures containing unobserved components with UNIFAC 2.0 (pred). For comparison, the results of UNIFAC 2.0 trained on all experimental data and UNIFAC 1.0 are also shown (fit). The "UNIFAC 1.0 horizon" comprises 25,998 data points for 2,202 binary mixtures, while an additional 1,289 experimental data points for 401 binary mixtures can only be predicted by UNIFAC 2.0 ("UNIFAC 2.0 only"). Error bars denote standard errors of the means.
  • ...and 1 more figures