Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components
Peter Fichtelmann, Julia Westermayr
TL;DR
This work tackles predicting odor strength, an important descriptor for fragrance design, by constructing an ordinal dataset of over 2,000 molecules from Good Scents and PubChem. It evaluates diverse molecular representations, including RDKit descriptors, Morgan/topological fingerprints, and pretrained encoders, under two learning schemes: a direct four-class ordinal predictor and a two-step odorous/non-odorous then strength pipeline. The best-performing model is a multilayer perceptron trained on 217 RDKit descriptors, achieving a macro MSE of approximately $0.53$ and $R^2$ of roughly $0.57$ on hold-out data, and generalizes to independent test molecules. SHAP analysis links polarity, molecular weight/size, ring features, and branching to odor-strength predictions, aligning with mass-transport constraints and offering mechanistic insight to enable in silico fragrance design.
Abstract
Predicting olfactory perception directly from molecular structure is central to fragrance design that plays a role in a wide range of industries, such as perfumery, food and beverage, and health care. Among olfactory attributes, odor strength is a key factor in shaping odor perception, but its modeling has been impeded by scarce and fragmented intensity data. In this work, we introduce an ordinal odor strength data set of over 2,000 molecules by integrating two different public sources, mapping structures to odorless, low, medium, and high categories. Across several molecular encodings and supervised learning algorithms we compared different prediction strategies. Dimensionality reduction and SHAP analysis identifies molecular size, polarity, ring features, and branching as primary drivers, consistent with mass-transport constraints on volatility, sorption, and receptor access. This scalable ordinal framework enables reliable odor-strength estimation for novel molecules and provides a foundation for in silico fragrance design.
