Table of Contents
Fetching ...

Navigating the Fragrance space Via Graph Generative Models And Predicting Odors

Mrityunjay Sharma, Sarabeshwar Balaji, Pinaki Saha, Ritesh Kumar

TL;DR

This study addresses the challenge of navigating the fragrance space by integrating graph-based generative models with odor prediction to efficiently generate and evaluate odorous molecules. It examines six graph-centric generative approaches (GAE, VGAE, ARGA, ARGVA, diffusion, and transformer-based models) alongside a logistic-regression odor-likeliness predictor trained on 51 features reduced to 20 by SHAP-guided selection, achieving a ROC AUC of $0.9701$. A four-stage pipeline—generation, validation, fragrance-likeliness screening, and odor prediction—coupled with MOSES benchmarks and interpretability via SHAP provides both high validity and actionable insights into feature contributions, notably logP, molecular weight, and related descriptors. The framework supports scalable fragrance discovery with open-source code and data, enabling reproducible olfactory design and broader applications in fragrance research and chemical space exploration.

Abstract

We explore a suite of generative modelling techniques to efficiently navigate and explore the complex landscapes of odor and the broader chemical space. Unlike traditional approaches, we not only generate molecules but also predict the odor likeliness with ROC AUC score of 0.97 and assign probable odor labels. We correlate odor likeliness with physicochemical features of molecules using machine learning techniques and leverage SHAP (SHapley Additive exPlanations) to demonstrate the interpretability of the function. The whole process involves four key stages: molecule generation, stringent sanitization checks for molecular validity, fragrance likeliness screening and odor prediction of the generated molecules. By making our code and trained models publicly accessible, we aim to facilitate broader adoption of our research across applications in fragrance discovery and olfactory research.

Navigating the Fragrance space Via Graph Generative Models And Predicting Odors

TL;DR

This study addresses the challenge of navigating the fragrance space by integrating graph-based generative models with odor prediction to efficiently generate and evaluate odorous molecules. It examines six graph-centric generative approaches (GAE, VGAE, ARGA, ARGVA, diffusion, and transformer-based models) alongside a logistic-regression odor-likeliness predictor trained on 51 features reduced to 20 by SHAP-guided selection, achieving a ROC AUC of . A four-stage pipeline—generation, validation, fragrance-likeliness screening, and odor prediction—coupled with MOSES benchmarks and interpretability via SHAP provides both high validity and actionable insights into feature contributions, notably logP, molecular weight, and related descriptors. The framework supports scalable fragrance discovery with open-source code and data, enabling reproducible olfactory design and broader applications in fragrance research and chemical space exploration.

Abstract

We explore a suite of generative modelling techniques to efficiently navigate and explore the complex landscapes of odor and the broader chemical space. Unlike traditional approaches, we not only generate molecules but also predict the odor likeliness with ROC AUC score of 0.97 and assign probable odor labels. We correlate odor likeliness with physicochemical features of molecules using machine learning techniques and leverage SHAP (SHapley Additive exPlanations) to demonstrate the interpretability of the function. The whole process involves four key stages: molecule generation, stringent sanitization checks for molecular validity, fragrance likeliness screening and odor prediction of the generated molecules. By making our code and trained models publicly accessible, we aim to facilitate broader adoption of our research across applications in fragrance discovery and olfactory research.

Paper Structure

This paper contains 25 sections, 5 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: Overview of the methodology: (a) From odorous molecules of the curated dataset aryan_amit_barsainyan_ritesh_kumar_pinaki_saha_michael_schmuker_2023 the edge and node features are extracted. The SMILE strings are then converted into a Pytorch geomteric dataset. (b) Generative models are applied thereafter and graphs are generated. (c) Generated graphs are checked for molecular stability based on the node and edge features. Molecules are constructed from the graph obtained and then converted into the SMILE strings. Again they are checked in the PubChem databasekim2023pubchem. (d) The valid and novel SMILE strings are checked for odor likeness using equation which is built using physicochemical features. (e) In case generated molecules are valid and odorous, the odor is predicted by graph neural network based model.
  • Figure 2: Logistic Regression Model Performance Analysis.(a) SHAP (SHapley Additive exPlanations) analysis: Summary plot illustrating the contribution of key molecular descriptors to the prediction of fragrance likeliness. (b) Receiver Operating Characteristic (ROC) curve: Demonstrates excellent performance with high true positive rates while maintaining low false positive rates.
  • Figure 3: Molecular properties comparison of generated molecules from ARGVA.(a) KS test of parameters used for odor likeliness. (b) Analysis of the fingerprints. (c) Functional group analysis of the generated set. Refer the supplementary section for results of other models
  • Figure 4: Sample of novel generated molecules (which were not in the training set): Visualization of molecular structures and their predicted odors, assigned from 138 odor labels using graph neural networks.(a) Results from transformer model (b) Results from GAE model. Refer to the supplementary section for results of other graph generative models.
  • Figure 5: Comparison of common odors predicted by various generative models: It can be observed that whereas some models produces variety others produce same odors in high quantity. Odor labels predicted for molecules generated by (a) GAE (b) VGAE (c) ARGA (d) ARGVA
  • ...and 11 more figures