Navigating the Fragrance space Via Graph Generative Models And Predicting Odors
Mrityunjay Sharma, Sarabeshwar Balaji, Pinaki Saha, Ritesh Kumar
TL;DR
This study addresses the challenge of navigating the fragrance space by integrating graph-based generative models with odor prediction to efficiently generate and evaluate odorous molecules. It examines six graph-centric generative approaches (GAE, VGAE, ARGA, ARGVA, diffusion, and transformer-based models) alongside a logistic-regression odor-likeliness predictor trained on 51 features reduced to 20 by SHAP-guided selection, achieving a ROC AUC of $0.9701$. A four-stage pipeline—generation, validation, fragrance-likeliness screening, and odor prediction—coupled with MOSES benchmarks and interpretability via SHAP provides both high validity and actionable insights into feature contributions, notably logP, molecular weight, and related descriptors. The framework supports scalable fragrance discovery with open-source code and data, enabling reproducible olfactory design and broader applications in fragrance research and chemical space exploration.
Abstract
We explore a suite of generative modelling techniques to efficiently navigate and explore the complex landscapes of odor and the broader chemical space. Unlike traditional approaches, we not only generate molecules but also predict the odor likeliness with ROC AUC score of 0.97 and assign probable odor labels. We correlate odor likeliness with physicochemical features of molecules using machine learning techniques and leverage SHAP (SHapley Additive exPlanations) to demonstrate the interpretability of the function. The whole process involves four key stages: molecule generation, stringent sanitization checks for molecular validity, fragrance likeliness screening and odor prediction of the generated molecules. By making our code and trained models publicly accessible, we aim to facilitate broader adoption of our research across applications in fragrance discovery and olfactory research.
