Table of Contents
Fetching ...

Generative Deep Learning Framework for Inverse Design of Fuels

Kiran K. Yalamanchi, Pinaki Pal, Balaji Mohan, Abdullah S. AlRamadan, Jihad A. Badra, Yuanjiang Pei

TL;DR

The paper tackles inverse design of high-RON fuels by coupling a co-optimized variational autoencoder with a data-driven RON predictor, enabling targeted exploration of chemical space. It introduces a Co-VAE that jointly optimizes molecular reconstruction and RON estimation via an LSTM-based encoder/decoder and a two-layer predictor, trained with a loss $L = \mathrm{BCE} + \beta\,\mathrm{KLD} + L_{\mathrm{RON}}$ under a β-annealing schedule. A separate regression model on latent representations, optimized with NSGA-2, delivers state-of-the-art RON prediction (CatBoost: $R^2 = 0.929$, MAE = 5.365) and robust cross-validation. Differential Evolution in the latent space then yields 1185 new high-RON candidates (1189 SMILES) that pass chemical validity checks, demonstrating the framework's ability to discover diverse, synthesizability-relevant fuel molecules with $RON > 110$, highlighting its potential for multi-property extension and uncertainty-aware design.

Abstract

In the present work, a generative deep learning framework combining a Co-optimized Variational Autoencoder (Co-VAE) architecture with quantitative structure-property relationship (QSPR) techniques is developed to enable accelerated inverse design of fuels. The Co-VAE integrates a property prediction component coupled with the VAE latent space, enhancing molecular reconstruction and accurate estimation of Research Octane Number (RON) (chosen as the fuel property of interest). A subset of the GDB-13 database, enriched with a curated RON database, is used for model training. Hyperparameter tuning is further utilized to optimize the balance among reconstruction fidelity, chemical validity, and RON prediction. An independent regression model is then used to refine RON prediction, while a differential evolution algorithm is employed to efficiently navigate the VAE latent space and identify promising fuel molecule candidates with high RON. This methodology addresses the limitations of traditional fuel screening approaches by capturing complex structure-property relationships within a comprehensive latent representation. The generative model can be adapted to different target properties, enabling systematic exploration of large chemical spaces relevant to fuel design applications. Furthermore, the demonstrated framework can be readily extended by incorporating additional synthesizability criteria to improve applicability and reliability for de novo design of new fuels.

Generative Deep Learning Framework for Inverse Design of Fuels

TL;DR

The paper tackles inverse design of high-RON fuels by coupling a co-optimized variational autoencoder with a data-driven RON predictor, enabling targeted exploration of chemical space. It introduces a Co-VAE that jointly optimizes molecular reconstruction and RON estimation via an LSTM-based encoder/decoder and a two-layer predictor, trained with a loss under a β-annealing schedule. A separate regression model on latent representations, optimized with NSGA-2, delivers state-of-the-art RON prediction (CatBoost: , MAE = 5.365) and robust cross-validation. Differential Evolution in the latent space then yields 1185 new high-RON candidates (1189 SMILES) that pass chemical validity checks, demonstrating the framework's ability to discover diverse, synthesizability-relevant fuel molecules with , highlighting its potential for multi-property extension and uncertainty-aware design.

Abstract

In the present work, a generative deep learning framework combining a Co-optimized Variational Autoencoder (Co-VAE) architecture with quantitative structure-property relationship (QSPR) techniques is developed to enable accelerated inverse design of fuels. The Co-VAE integrates a property prediction component coupled with the VAE latent space, enhancing molecular reconstruction and accurate estimation of Research Octane Number (RON) (chosen as the fuel property of interest). A subset of the GDB-13 database, enriched with a curated RON database, is used for model training. Hyperparameter tuning is further utilized to optimize the balance among reconstruction fidelity, chemical validity, and RON prediction. An independent regression model is then used to refine RON prediction, while a differential evolution algorithm is employed to efficiently navigate the VAE latent space and identify promising fuel molecule candidates with high RON. This methodology addresses the limitations of traditional fuel screening approaches by capturing complex structure-property relationships within a comprehensive latent representation. The generative model can be adapted to different target properties, enabling systematic exploration of large chemical spaces relevant to fuel design applications. Furthermore, the demonstrated framework can be readily extended by incorporating additional synthesizability criteria to improve applicability and reliability for de novo design of new fuels.

Paper Structure

This paper contains 8 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Integrated approach for generative fuel design. Step 1: Co-optimize the VAE for molecular reconstruction and RON prediction. Step 2: Fine-tune the regression model to enhance RON estimation. Step 3: Explore latent space regions linked to high RON and decode the corresponding species.
  • Figure 2: Co-VAE framework used in this study: an LSTM encoder translates SMILES one-hot encoding representation into a latent space, from which an LSTM decoder reconstructs the molecular one-hot encoding, while a feedforward network co-optimizes the latent space for RON prediction.
  • Figure 3: Reconstruction accuracy, validity, and RON MAE computed on validation set for the hyperparameter optimization iterations of Co-VAE. Composite score represents the sum of reconstruction accuracy, validity and five times the inverse of RON MAE.
  • Figure 4: Parity plot for the final optimized CatBoost model. The ranges (error bars) shown for R², MAE, and RMSE reflect the variability observed for the 10-fold cross-validation study.
  • Figure 5: Workflow for the generative fuel design using Co-VAE and regression models.
  • ...and 2 more figures