Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

Namkyeong Lee; Siddhartha Laghuvarapu; Chanyoung Park; Jimeng Sun

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun

TL;DR

The paper tackles the two main bottlenecks of Molecule Language Models: limited molecule-text paired data and uneven expert coverage. It introduces AMOLE, which augments molecule-text pairs through structural similarity preserving text sharing and transfers expert knowledge via an Expertise Reconstruction loss, enabling richer cross-modal representations. Through extensive experiments across zero-shot cross-modal retrieval, zero-shot QA, molecular property prediction, and zero-shot virtual screening, AMOLE demonstrates consistent improvements and practical potential for drug discovery. The work also shows AMOLE's applicability to architectures beyond CLIP-style models and discusses directions to further expand descriptor coverage using decoder-based generation and style transfer.

Abstract

Recently, there has been a growing interest among researchers in understanding molecules and their textual descriptions through molecule language models (MoLM). However, despite some early promising developments, the advancement of MoLM still trails significantly behind that of vision language models (VLM). This is because unique challenges exist apart from VLM in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Specifically, AMOLE enriches molecule-text pairs by sharing descriptions among structurally similar molecules with a novel structural similarity preserving loss. Moreover, we propose an expertise reconstruction loss to transfer knowledge from molecules that have extensive expertise to those with less expertise. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery. The source code for AMOLE is available at https://github.com/Namkyeong/AMOLE.

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

TL;DR

Abstract

Paper Structure (30 sections, 7 equations, 9 figures, 7 tables)

This paper contains 30 sections, 7 equations, 9 figures, 7 tables.

Introduction
Related Works
Preliminaries
Problem Statement
Tanimoto Similarity
Molecule-Text Contrastive Learning
Methodology
Augmenting Molecule-Text pairs
Structural Similarity Preserving Loss
Expertise Transfer Module
Model Training
Experiments
Experimental Setup
Zero-Shot Cross-Modal Retrieval
Zero-Shot Question and Answering
...and 15 more sections

Figures (9)

Figure 1: (a) Rivastigmine's textual descriptions from various experts. (b) The majority of molecules in the PubChem database have only one description provided by an expert.
Figure 2: Overall model architecture of AMOLE.
Figure 3: Sensitivity analysis on $k$.
Figure 4: Hit rate (%) in zero-shot virtual screening task.
Figure 5: Sensitivity analysis on $\alpha$.
...and 4 more figures

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

TL;DR

Abstract

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)