Graph Machine Learning for Design of High-Octane Fuels
Jan G. Rittig, Martin Ritzert, Artur M. Schweidtmann, Stefanie Winkler, Jana M. Weber, Philipp Morsch, K. Alexander Heufer, Martin Grohe, Alexander Mitsos, Manuel Dahmen
TL;DR
The paper tackles the design of high-octane fuels by proposing a modular graph-ML CAMD framework that combines generative graph-ML models, graph neural networks for ignition-property prediction, and optimization to maximize $p= ext{RON}+ ext{OS}$ in a continuous molecular space. It analyzes three generator types (JT-VAE, MHG-VAE, MolGAN) and two optimizers (Bayesian optimization and genetic algorithms), augmented by an applicability-domain mechanism based on one-class SVMs to curb unreliable extrapolations. The study demonstrates that the framework can identify well-known octane boosters (e.g., MTBE, ETBE) and new candidates (notably 2,2-DMP), while also experimentally evaluating a selected candidate to reveal limitations due to data scarcity and model extrapolation. The results underscore the potential of graph-ML CAMD for fuel design, the importance of experimental validation, and the need for expanded ignition-property datasets to improve predictive accuracy in data-constrained settings. The framework is modular and extensible to additional properties and CAMD applications beyond fuels.
Abstract
Fuels with high-knock resistance enable modern spark-ignition engines to achieve high efficiency and thus low CO2 emissions. Identification of molecules with desired autoignition properties indicated by a high research octane number and a high octane sensitivity is therefore of great practical relevance and can be supported by computer-aided molecular design (CAMD). Recent developments in the field of graph machine learning (graph-ML) provide novel, promising tools for CAMD. We propose a modular graph-ML CAMD framework that integrates generative graph-ML models with graph neural networks and optimization, enabling the design of molecules with desired ignition properties in a continuous molecular space. In particular, we explore the potential of Bayesian optimization and genetic algorithms in combination with generative graph-ML models. The graph-ML CAMD framework successfully identifies well-established high-octane components. It also suggests new candidates, one of which we experimentally investigate and use to illustrate the need for further auto-ignition training data.
