Generative Deep Learning Framework for Inverse Design of Fuels
Kiran K. Yalamanchi, Pinaki Pal, Balaji Mohan, Abdullah S. AlRamadan, Jihad A. Badra, Yuanjiang Pei
TL;DR
The paper tackles inverse design of high-RON fuels by coupling a co-optimized variational autoencoder with a data-driven RON predictor, enabling targeted exploration of chemical space. It introduces a Co-VAE that jointly optimizes molecular reconstruction and RON estimation via an LSTM-based encoder/decoder and a two-layer predictor, trained with a loss $L = \mathrm{BCE} + \beta\,\mathrm{KLD} + L_{\mathrm{RON}}$ under a β-annealing schedule. A separate regression model on latent representations, optimized with NSGA-2, delivers state-of-the-art RON prediction (CatBoost: $R^2 = 0.929$, MAE = 5.365) and robust cross-validation. Differential Evolution in the latent space then yields 1185 new high-RON candidates (1189 SMILES) that pass chemical validity checks, demonstrating the framework's ability to discover diverse, synthesizability-relevant fuel molecules with $RON > 110$, highlighting its potential for multi-property extension and uncertainty-aware design.
Abstract
In the present work, a generative deep learning framework combining a Co-optimized Variational Autoencoder (Co-VAE) architecture with quantitative structure-property relationship (QSPR) techniques is developed to enable accelerated inverse design of fuels. The Co-VAE integrates a property prediction component coupled with the VAE latent space, enhancing molecular reconstruction and accurate estimation of Research Octane Number (RON) (chosen as the fuel property of interest). A subset of the GDB-13 database, enriched with a curated RON database, is used for model training. Hyperparameter tuning is further utilized to optimize the balance among reconstruction fidelity, chemical validity, and RON prediction. An independent regression model is then used to refine RON prediction, while a differential evolution algorithm is employed to efficiently navigate the VAE latent space and identify promising fuel molecule candidates with high RON. This methodology addresses the limitations of traditional fuel screening approaches by capturing complex structure-property relationships within a comprehensive latent representation. The generative model can be adapted to different target properties, enabling systematic exploration of large chemical spaces relevant to fuel design applications. Furthermore, the demonstrated framework can be readily extended by incorporating additional synthesizability criteria to improve applicability and reliability for de novo design of new fuels.
