Table of Contents
Fetching ...

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

He Cao, Zijing Liu, Xingyu Lu, Yuan Yao, Yu Li

TL;DR

InstructMol presents a two-stage, multimodal instruction-tuning framework that aligns molecular graphs and sequences with natural language to create a versatile drug-discovery assistant. By freezing the molecular encoder and training a lightweight alignment projector followed by task-specific LoRA adapters on a Vicuna-based LLM, it achieves substantial gains across property prediction, molecule description, and chemical reaction tasks compared to generalist LLMs and approaches that lack robust modality alignment. The approach effectively narrows the gap with specialist models while preserving the adaptability and open-ended reasoning of LLMs, and demonstrates the value of modular, data-efficient multimodal fine-tuning in molecular science. Limitations include data scarcity, the need for domain-specific LLMs, and concerns about reliability and hallucinations, guiding future work toward richer datasets and safer, more robust evaluations.

Abstract

The rapid evolution of artificial intelligence in drug discovery encounters challenges with generalization and extensive training, yet Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our novel contribution, InstructMol, a multi-modal LLM, effectively aligns molecular structures with natural language via an instruction-tuning approach, utilizing a two-stage training strategy that adeptly combines limited domain-specific data with molecular and textual information. InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks, surpassing leading LLMs and significantly reducing the gap with specialized models, thereby establishing a robust foundation for a versatile and dependable drug discovery assistant.

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

TL;DR

InstructMol presents a two-stage, multimodal instruction-tuning framework that aligns molecular graphs and sequences with natural language to create a versatile drug-discovery assistant. By freezing the molecular encoder and training a lightweight alignment projector followed by task-specific LoRA adapters on a Vicuna-based LLM, it achieves substantial gains across property prediction, molecule description, and chemical reaction tasks compared to generalist LLMs and approaches that lack robust modality alignment. The approach effectively narrows the gap with specialist models while preserving the adaptability and open-ended reasoning of LLMs, and demonstrates the value of modular, data-efficient multimodal fine-tuning in molecular science. Limitations include data scarcity, the need for domain-specific LLMs, and concerns about reliability and hallucinations, guiding future work toward richer datasets and safer, more robust evaluations.

Abstract

The rapid evolution of artificial intelligence in drug discovery encounters challenges with generalization and extensive training, yet Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our novel contribution, InstructMol, a multi-modal LLM, effectively aligns molecular structures with natural language via an instruction-tuning approach, utilizing a two-stage training strategy that adeptly combines limited domain-specific data with molecular and textual information. InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks, surpassing leading LLMs and significantly reducing the gap with specialized models, thereby establishing a robust foundation for a versatile and dependable drug discovery assistant.
Paper Structure (40 sections, 1 equation, 8 figures, 15 tables)

This paper contains 40 sections, 1 equation, 8 figures, 15 tables.

Figures (8)

  • Figure 1: Empowering LLMs with molecular modalities to unlock the drug discovery domain and serve as assistants in molecular research.
  • Figure 2: Overview of InstructMol model architecture design and two-stage training paradigm. The example molecule in the figure is TerephthalaldehydeTerephthalaldehyde (CID 12173).
  • Figure 3: Comparison of biomolecule-domain molecule-text dataset scale with existing general domain vision-language datasets.
  • Figure 4: More examples of molecule description generation task on ChEBI-20 Text2Mol test set. We include Mol-Instruction Mol-Instructions as the baseline. CID CID: PubChem Compound Identification, a non-zero integer PubChem accession identifier for a unique chemical structure.
  • Figure 5: More examples of forward reaction prediction task. We include Mol-Instruction Mol-Instructions and Multitask-Text-and-Chemistry-T5 Text+ChemT5 as baselines.
  • ...and 3 more figures