Table of Contents
Fetching ...

Molecular Odor Prediction with Harmonic Modulated Feature Mapping and Chemically-Informed Loss

HongXin Xie, JianDe Sun, Yi Shao, Shuai Li, Sujuan Hou, YuLong Sun, Yuxiang Liu

TL;DR

The paper tackles molecular odor prediction under challenges of non-smooth objective functions and severe label imbalance. It introduces Harmonic Modulated Feature Mapping (HMFM) to learn feature importance and apply frequency modulation for efficient, nonlinear encoding of molecular features, and Chemically-Informed Loss (CIL) to address imbalance, structure consistency, and descriptor co-occurrence. The total loss combines five components with carefully chosen hyperparameters, and HMFM/CIL are validated across mainstream deep learning models, showing consistent improvements in F1 and AUROC, especially for minority descriptors. The approach offers a robust, chemistry-informed framework that enhances molecular structure representation and predicts odor descriptors with greater reliability, carrying significant implications for chemoinformatics, fragrance design, and environmental monitoring.

Abstract

Molecular odor prediction has great potential across diverse fields such as chemistry, pharmaceuticals, and environmental science, enabling the rapid design of new materials and enhancing environmental monitoring. However, current methods face two main challenges: First, existing models struggle with non-smooth objective functions and the complexity of mixed feature dimensions; Second, datasets suffer from severe label imbalance, which hampers model training, particularly in learning minority class labels. To address these issues, we introduce a novel feature mapping method and a molecular ensemble optimization loss function. By incorporating feature importance learning and frequency modulation, our model adaptively adjusts the contribution of each feature, efficiently capturing the intricate relationship between molecular structures and odor descriptors. Our feature mapping preserves feature independence while enhancing the model's efficiency in utilizing molecular features through frequency modulation. Furthermore, the proposed loss function dynamically adjusts label weights, improves structural consistency, and strengthens label correlations, effectively addressing data imbalance and label co-occurrence challenges. Experimental results show that our method significantly can improves the accuracy of molecular odor prediction across various deep learning models, demonstrating its promising potential in molecular structure representation and chemoinformatics.

Molecular Odor Prediction with Harmonic Modulated Feature Mapping and Chemically-Informed Loss

TL;DR

The paper tackles molecular odor prediction under challenges of non-smooth objective functions and severe label imbalance. It introduces Harmonic Modulated Feature Mapping (HMFM) to learn feature importance and apply frequency modulation for efficient, nonlinear encoding of molecular features, and Chemically-Informed Loss (CIL) to address imbalance, structure consistency, and descriptor co-occurrence. The total loss combines five components with carefully chosen hyperparameters, and HMFM/CIL are validated across mainstream deep learning models, showing consistent improvements in F1 and AUROC, especially for minority descriptors. The approach offers a robust, chemistry-informed framework that enhances molecular structure representation and predicts odor descriptors with greater reliability, carrying significant implications for chemoinformatics, fragrance design, and environmental monitoring.

Abstract

Molecular odor prediction has great potential across diverse fields such as chemistry, pharmaceuticals, and environmental science, enabling the rapid design of new materials and enhancing environmental monitoring. However, current methods face two main challenges: First, existing models struggle with non-smooth objective functions and the complexity of mixed feature dimensions; Second, datasets suffer from severe label imbalance, which hampers model training, particularly in learning minority class labels. To address these issues, we introduce a novel feature mapping method and a molecular ensemble optimization loss function. By incorporating feature importance learning and frequency modulation, our model adaptively adjusts the contribution of each feature, efficiently capturing the intricate relationship between molecular structures and odor descriptors. Our feature mapping preserves feature independence while enhancing the model's efficiency in utilizing molecular features through frequency modulation. Furthermore, the proposed loss function dynamically adjusts label weights, improves structural consistency, and strengthens label correlations, effectively addressing data imbalance and label co-occurrence challenges. Experimental results show that our method significantly can improves the accuracy of molecular odor prediction across various deep learning models, demonstrating its promising potential in molecular structure representation and chemoinformatics.

Paper Structure

This paper contains 9 sections, 18 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Distribution of odor descriptor frequency in dataset.
  • Figure 2: Density distribution of molecular labels in datasets.
  • Figure 3: Co-ocurrence matrix for odor descriptors.
  • Figure 4: Comparison of F1 scores of histogram of Harmonic Modulated Feature Mapping on mainstream deep learning model.
  • Figure 5: Comparison of histogram AUROC of Harmonic Modulated Feature Mapping on mainstream deep learning model.