InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery
He Cao, Zijing Liu, Xingyu Lu, Yuan Yao, Yu Li
TL;DR
InstructMol presents a two-stage, multimodal instruction-tuning framework that aligns molecular graphs and sequences with natural language to create a versatile drug-discovery assistant. By freezing the molecular encoder and training a lightweight alignment projector followed by task-specific LoRA adapters on a Vicuna-based LLM, it achieves substantial gains across property prediction, molecule description, and chemical reaction tasks compared to generalist LLMs and approaches that lack robust modality alignment. The approach effectively narrows the gap with specialist models while preserving the adaptability and open-ended reasoning of LLMs, and demonstrates the value of modular, data-efficient multimodal fine-tuning in molecular science. Limitations include data scarcity, the need for domain-specific LLMs, and concerns about reliability and hallucinations, guiding future work toward richer datasets and safer, more robust evaluations.
Abstract
The rapid evolution of artificial intelligence in drug discovery encounters challenges with generalization and extensive training, yet Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our novel contribution, InstructMol, a multi-modal LLM, effectively aligns molecular structures with natural language via an instruction-tuning approach, utilizing a two-stage training strategy that adeptly combines limited domain-specific data with molecular and textual information. InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks, surpassing leading LLMs and significantly reducing the gap with specialized models, thereby establishing a robust foundation for a versatile and dependable drug discovery assistant.
