Table of Contents
Fetching ...

Predicting the Temperature Dependence of Surfactant CMCs Using Graph Neural Networks

Christoforos Brozos, Jan G. Rittig, Sandip Bhattacharya, Elie Akanny, Christina Kohlmann, Alexander Mitsos

TL;DR

This work addresses predicting the temperature dependence of surfactant CMCs across ionic, nonionic, zwitterionic, and sugar-based classes by developing an end-to-end GNN that incorporates temperature into the molecular fingerprint. An ensemble of GINEConv-based GNNs is trained on a dataset of $1{,}377$ CMC measurements from $492$ unique surfactants spanning $0^\circ$C to $90^\circ$C, with two evaluation schemes to test temperature extrapolation and generalization to unseen structures. The model achieves high predictive performance, with $R^2$ around $0.97$ for the different-temperature split and $0.94$ for the distinct-surfactant split, and RMSEs of about $0.173$ and $0.251$ (log CMC units), respectively, outperforming several prior temperature-independent approaches on larger, more diverse datasets. While the approach is robust, it underestimates some temperature sensitivities and exhibits class-dependent variability, especially for sugar-based surfactants, motivating more data and potentially geometry-aware GNNs, along with pH considerations and explainability in future work.

Abstract

The critical micelle concentration (CMC) of surfactant molecules is an essential property for surfactant applications in industry. Recently, classical QSPR and Graph Neural Networks (GNNs), a deep learning technique, have been successfully applied to predict the CMC of surfactants at room temperature. However, these models have not yet considered the temperature dependency of the CMC, which is highly relevant for practical applications. We herein develop a GNN model for temperature-dependent CMC prediction of surfactants. We collect about 1400 data points from public sources for all surfactant classes, i.e., ionic, nonionic, and zwitterionic, at multiple temperatures. We test the predictive quality of the model for following scenarios: i) when CMC data for surfactants are present in the training of the model in at least one different temperature, and ii) CMC data for surfactants are not present in the training, i.e., generalizing to unseen surfactants. In both test scenarios, our model exhibits a high predictive performance of R$^2 \geq $ 0.94 on test data. We also find that the model performance varies by surfactant class. Finally, we evaluate the model for sugar-based surfactants with complex molecular structures, as these represent a more sustainable alternative to synthetic surfactants and are therefore of great interest for future applications in the personal and home care industries.

Predicting the Temperature Dependence of Surfactant CMCs Using Graph Neural Networks

TL;DR

This work addresses predicting the temperature dependence of surfactant CMCs across ionic, nonionic, zwitterionic, and sugar-based classes by developing an end-to-end GNN that incorporates temperature into the molecular fingerprint. An ensemble of GINEConv-based GNNs is trained on a dataset of CMC measurements from unique surfactants spanning C to C, with two evaluation schemes to test temperature extrapolation and generalization to unseen structures. The model achieves high predictive performance, with around for the different-temperature split and for the distinct-surfactant split, and RMSEs of about and (log CMC units), respectively, outperforming several prior temperature-independent approaches on larger, more diverse datasets. While the approach is robust, it underestimates some temperature sensitivities and exhibits class-dependent variability, especially for sugar-based surfactants, motivating more data and potentially geometry-aware GNNs, along with pH considerations and explainability in future work.

Abstract

The critical micelle concentration (CMC) of surfactant molecules is an essential property for surfactant applications in industry. Recently, classical QSPR and Graph Neural Networks (GNNs), a deep learning technique, have been successfully applied to predict the CMC of surfactants at room temperature. However, these models have not yet considered the temperature dependency of the CMC, which is highly relevant for practical applications. We herein develop a GNN model for temperature-dependent CMC prediction of surfactants. We collect about 1400 data points from public sources for all surfactant classes, i.e., ionic, nonionic, and zwitterionic, at multiple temperatures. We test the predictive quality of the model for following scenarios: i) when CMC data for surfactants are present in the training of the model in at least one different temperature, and ii) CMC data for surfactants are not present in the training, i.e., generalizing to unseen surfactants. In both test scenarios, our model exhibits a high predictive performance of R 0.94 on test data. We also find that the model performance varies by surfactant class. Finally, we evaluate the model for sugar-based surfactants with complex molecular structures, as these represent a more sustainable alternative to synthetic surfactants and are therefore of great interest for future applications in the personal and home care industries.
Paper Structure (16 sections, 11 figures, 10 tables)

This paper contains 16 sections, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Distribution of the CMC data with the number of data points over the temperature range of T$=10-90$ for a bin size of 10$^\circ$C.
  • Figure 2: Experimental values of CMC (in mM) at different temperatures for six surfactants present in our database are tabulated: (S1) dodecyltrimethylammonium chloride (DTAC) Perger2007, (S2) benzyl (3-hexadecanoylaminoethyl)dimethylammonium chloride (C15AEtBzMe2Cl) Galgano2010, (S3) Decyl diglucoside Brinatti2014, (S4) Sulfobetaine 14 Cheng2012, (S5) Sodium decyl 2 sulfate Mukerjee1971 and (S6) Sodium decanoate GonzalezPerez2005. Each of these surfactants shows a different temperature dependence.
  • Figure 3: Schematic representation of the developed graph neural network for predicting temperature-dependent CMC values of surfactant monomers.
  • Figure 4: Parity plots of the ensemble of GNNs on the two test data sets: (a) different temperature and (b) distinct surfactant. Surfactant classes are highlighted with different colors and markers. The logarithm is applied to CMC in $\mu$M (base 10).
  • Figure 5: The mean absolute percentage error (MAPE) on the two test data sets: different temperature (yellow) and distinct surfactant (blue). The mentioned ranges exclude the left limit but include the right limit, for example (0 - 10] and (10 - 20]. The corresponding number of data points for each temperature range are denoted at the top of the respective bars.
  • ...and 6 more figures