Table of Contents
Fetching ...

Deep Learning for GWP Prediction: A Framework Using PCA, Quantile Transformation, and Ensemble Modeling

Navin Rajapriya, Kotaro Kawajiri

TL;DR

The paper addresses the need to predict the 100-year global warming potential (GWP100) of refrigerants using a data-driven framework that combines PCA-based dimensionality reduction, quantile transformation to address skew, and ensemble fully connected neural networks implemented on the Multi-Sigma platform. It systematically compares RDKit, Mordred, and alvaDesc molecular descriptors, finding RDKit-based ensembles to deliver the best generalization (RMSE ≈ 481.9, R2 ≈ 0.918) on a 207-sample IPCC AR6-derived dataset. The study also uses factor analysis to interpret influential molecular features (e.g., molecular weight, lipophilicity, nitriles, and allylic oxides) and demonstrates that a scalable, interpretable framework can accelerate virtual screening for designing low-GWP refrigerants. Overall, the approach offers a practical pathway for rapid screening and design of environmentally sustainable refrigerants, aligning with global climate mitigation goals and regulatory developments like the Kigali Amendment.

Abstract

Developing environmentally sustainable refrigerants is critical for mitigating the impact of anthropogenic greenhouse gases on global warming. This study presents a predictive modeling framework to estimate the 100-year global warming potential (GWP 100) of single-component refrigerants using a fully connected neural network implemented on the Multi-Sigma platform. Molecular descriptors from RDKit, Mordred, and alvaDesc were utilized to capture various chemical features. The RDKit-based model achieved the best performance, with a Root Mean Square Error (RMSE) of 481.9 and an R2 score of 0.918, demonstrating superior predictive accuracy and generalizability. Dimensionality reduction through Principal Component Analysis (PCA) and quantile transformation were applied to address the high-dimensional and skewed nature of the dataset,enhancing model stability and performance. Factor analysis identified vital molecular features, including molecular weight, lipophilicity, and functional groups, such as nitriles and allylic oxides, as significant contributors to GWP values. These insights provide actionable guidance for designing environmentally sustainable refrigerants. Integrating RDKit descriptors with Multi-Sigma's framework, which includes PCA, quantile transformation, and neural networks, provides a scalable solution for the rapid virtual screening of low-GWP refrigerants. This approach can potentially accelerate the identification of eco-friendly alternatives, directly contributing to climate mitigation by enabling the design of next-generation refrigerants aligned with global sustainability objectives.

Deep Learning for GWP Prediction: A Framework Using PCA, Quantile Transformation, and Ensemble Modeling

TL;DR

The paper addresses the need to predict the 100-year global warming potential (GWP100) of refrigerants using a data-driven framework that combines PCA-based dimensionality reduction, quantile transformation to address skew, and ensemble fully connected neural networks implemented on the Multi-Sigma platform. It systematically compares RDKit, Mordred, and alvaDesc molecular descriptors, finding RDKit-based ensembles to deliver the best generalization (RMSE ≈ 481.9, R2 ≈ 0.918) on a 207-sample IPCC AR6-derived dataset. The study also uses factor analysis to interpret influential molecular features (e.g., molecular weight, lipophilicity, nitriles, and allylic oxides) and demonstrates that a scalable, interpretable framework can accelerate virtual screening for designing low-GWP refrigerants. Overall, the approach offers a practical pathway for rapid screening and design of environmentally sustainable refrigerants, aligning with global climate mitigation goals and regulatory developments like the Kigali Amendment.

Abstract

Developing environmentally sustainable refrigerants is critical for mitigating the impact of anthropogenic greenhouse gases on global warming. This study presents a predictive modeling framework to estimate the 100-year global warming potential (GWP 100) of single-component refrigerants using a fully connected neural network implemented on the Multi-Sigma platform. Molecular descriptors from RDKit, Mordred, and alvaDesc were utilized to capture various chemical features. The RDKit-based model achieved the best performance, with a Root Mean Square Error (RMSE) of 481.9 and an R2 score of 0.918, demonstrating superior predictive accuracy and generalizability. Dimensionality reduction through Principal Component Analysis (PCA) and quantile transformation were applied to address the high-dimensional and skewed nature of the dataset,enhancing model stability and performance. Factor analysis identified vital molecular features, including molecular weight, lipophilicity, and functional groups, such as nitriles and allylic oxides, as significant contributors to GWP values. These insights provide actionable guidance for designing environmentally sustainable refrigerants. Integrating RDKit descriptors with Multi-Sigma's framework, which includes PCA, quantile transformation, and neural networks, provides a scalable solution for the rapid virtual screening of low-GWP refrigerants. This approach can potentially accelerate the identification of eco-friendly alternatives, directly contributing to climate mitigation by enabling the design of next-generation refrigerants aligned with global sustainability objectives.

Paper Structure

This paper contains 12 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Framework for predicting GWP values using molecular descriptors, PCA, quantile transformation, and neural networks, implemented on the Multi-Sigma platform
  • Figure 2: Distribution of GWP100 values before and after quantile transformation: (a) Original scale (b) Quantile transform uniform scale
  • Figure 3: Cumulative explained variance as a function of the number of PCs for each molecular descriptor package: (a) RDKit (48 PCs), (b) Mordred (73 PCs), and (c) alvaDesc (99 PCs). The red dashed line represents the 99% variance threshold used to select the PCs for dimensionality reduction.
  • Figure 4: Predicted vs. True GWP 100 values for the top three models and ensemble predictions for each molecular descriptor package. The dotted line represents the ideal trend, and the dashed line shows the ensemble trend
  • Figure 5: Contribution of principal components (PCs) to GWP predictions based on factor analysis of RDKit-based ensemble model.