Table of Contents
Fetching ...

Advancing Drug Discovery with Enhanced Chemical Understanding via Asymmetric Contrastive Multimodal Learning

Yifei Wang, Yunrui Li, Lin Liu, Pengyu Hong, Hao Xu

TL;DR

This work introduces Asymmetric Contrastive Multimodal Learning (ACML) for molecules, enabling cross-modal knowledge transfer from pre-trained chemical modalities into a shallow graph encoder to improve molecular representations for drug discovery. By freezing unimodal encoders (e.g., SMILES, images, NMR, GCMS/LCMS) and training a 5-layer graph encoder with asymmetric contrastive learning, ACML achieves expressive, interpretable embeddings while maintaining training efficiency. Across cross-modality retrieval, isomer discrimination, and molecular-property prediction on MoleculeNet and TDC, ACML demonstrates superior or competitive performance and reveals chemical semantics embedded in graph representations. The results highlight modality-specific strengths, efficient training, and enhanced interpretability, underscoring ACML’s potential to advance AI-driven chemical research and drug discovery.

Abstract

The versatility of multimodal deep learning holds tremendous promise for advancing scientific research and practical applications. As this field continues to evolve, the collective power of cross-modal analysis promises to drive transformative innovations, opening new frontiers in chemical understanding and drug discovery. Hence, we introduce Asymmetric Contrastive Multimodal Learning (ACML), a specifically designed approach to enhance molecular understanding and accelerate advancements in drug discovery. ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations. By combining pre-trained chemical unimodal encoders and a shallow-designed graph encoder with 5 layers, ACML facilitates the assimilation of coordinated chemical semantics from different modalities, leading to comprehensive representation learning with efficient training. We demonstrate the effectiveness of this framework through large-scale cross-modality retrieval and isomer discrimination tasks. Additionally, ACML enhances interpretability by revealing chemical semantics in graph presentations and bolsters the expressive power of graph neural networks, as evidenced by improved performance in molecular property prediction tasks from MoleculeNet and Therapeutics Data Commons (TDC). Ultimately, ACML exemplifies its potential to revolutionize molecular representational learning, offering deeper insights into the chemical semantics of diverse modalities and paving the way for groundbreaking advancements in chemical research and drug discovery.

Advancing Drug Discovery with Enhanced Chemical Understanding via Asymmetric Contrastive Multimodal Learning

TL;DR

This work introduces Asymmetric Contrastive Multimodal Learning (ACML) for molecules, enabling cross-modal knowledge transfer from pre-trained chemical modalities into a shallow graph encoder to improve molecular representations for drug discovery. By freezing unimodal encoders (e.g., SMILES, images, NMR, GCMS/LCMS) and training a 5-layer graph encoder with asymmetric contrastive learning, ACML achieves expressive, interpretable embeddings while maintaining training efficiency. Across cross-modality retrieval, isomer discrimination, and molecular-property prediction on MoleculeNet and TDC, ACML demonstrates superior or competitive performance and reveals chemical semantics embedded in graph representations. The results highlight modality-specific strengths, efficient training, and enhanced interpretability, underscoring ACML’s potential to advance AI-driven chemical research and drug discovery.

Abstract

The versatility of multimodal deep learning holds tremendous promise for advancing scientific research and practical applications. As this field continues to evolve, the collective power of cross-modal analysis promises to drive transformative innovations, opening new frontiers in chemical understanding and drug discovery. Hence, we introduce Asymmetric Contrastive Multimodal Learning (ACML), a specifically designed approach to enhance molecular understanding and accelerate advancements in drug discovery. ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations. By combining pre-trained chemical unimodal encoders and a shallow-designed graph encoder with 5 layers, ACML facilitates the assimilation of coordinated chemical semantics from different modalities, leading to comprehensive representation learning with efficient training. We demonstrate the effectiveness of this framework through large-scale cross-modality retrieval and isomer discrimination tasks. Additionally, ACML enhances interpretability by revealing chemical semantics in graph presentations and bolsters the expressive power of graph neural networks, as evidenced by improved performance in molecular property prediction tasks from MoleculeNet and Therapeutics Data Commons (TDC). Ultimately, ACML exemplifies its potential to revolutionize molecular representational learning, offering deeper insights into the chemical semantics of diverse modalities and paving the way for groundbreaking advancements in chemical research and drug discovery.
Paper Structure (47 sections, 6 equations, 16 figures, 8 tables)

This paper contains 47 sections, 6 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: The framework of ACML (Asymmetric Contrastive Multimodal Learning). a. The conceptual view of ACML. Multimodal learning on multiple pairs of graph-chemical modalities (SMILES, Images, NMR, or Mass spectrometry) is established through asymmetric CLIP architecture radford2021learning. The graph encoder is trained through ACML, while the encoder of chemical modality is pretrained and exempted from training. Embeddings from two modalities are projected into a joint latent space, aiming to align multiple views of the same molecule while distancing the embeddings from different molecules. b. Chemical knowledge transfer in ACML. For each chemical modality, an individual from-scratch graph encoder is paired and trained through the ACML framework, enabling it to express the learned chemical semantics from the corresponding chemical modality through latent embeddings. c. Comprehensive downstream tasks to demonstrate the effectiveness, interpretability, and generalization ability of graph learning in ACML.
  • Figure 2: Cross-modality retrieval accuracy across all ACML models, including G-SMILES, G-Image, G-1H NMR, G-13C NMR, G-GCMS, and G-LCMS. The x-axis represents the molecular pool ranging from 1,000 (1K) to 1 million (1M) molecules on the log scale. Top-10 and top-100 accuracy of random guesses are incorporated. Each model retrieves the target molecule by ranking similarity scores within this pool. The y-axis is the top k accuracy, measuring whether the target molecule is within the k most similar molecules recommended by the model. All models consistently identify the target molecule from pools exceeding 5,000 candidates. Notably, Image- and SMILES-based models maintain over 90% top-100 accuracy even with a pool of 1M molecules.
  • Figure 3: Graph embedding visualization via PCA. For each property, one representative figure is shown using the ACML instantiation that achieves the highest PCC score.
  • Figure S4: 13C NMR Isomer discrimination task demonstrations. The correct pairs of 13C NMR and molecule are presented horizontally. If model correctly identifies the correct molecule from spectrum, they are linked by solid arrow and labeled as "Matched". Out of the 4 challenging isomer pairs, our proposed model correctly identifies 3 isomers.
  • Figure S5: 1H NMR Isomer discrimination demonstrations. The correct pairs of 1H NMR and molecule are presented horizontally. If model correctly identifies the correct molecule from spectrum, they are linked by solid arrow and labeled as "Matched". Out of the 4 challenging isomer pairs, our proposed model correctly identifies 3 isomers.
  • ...and 11 more figures