Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Gang Liu; Michael Sun; Wojciech Matusik; Meng Jiang; Jie Chen

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Gang Liu, Michael Sun, Wojciech Matusik, Meng Jiang, Jie Chen

TL;DR

We address the challenge of integrating graph-structured molecular data with large language models for inverse molecular design. We propose Llamole, a graph-text multimodal LLM that interleaves text, molecular graphs, and reactions using trigger-query mechanisms and an A* planner; we validate on MolQA and MolPair, showing substantial gains over 14 adapted LLM baselines in controllability and retrosynthetic planning, with retrosynthesis success rising from 5.5% to 35% and property control improvements up to 80.9%. The work demonstrates the value of graph-text multimodality for practical molecular discovery and provides new datasets and a benchmarking framework for future research.

Abstract

While large language models (LLMs) have integrated images, adapting them to graphs remains challenging, limiting their applications in materials and drug design. This difficulty stems from the need for coherent autoregressive generation across texts and graphs. To address this, we introduce Llamole, the first multimodal LLM capable of interleaved text and graph generation, enabling molecular inverse design with retrosynthetic planning. Llamole integrates a base LLM with the Graph Diffusion Transformer and Graph Neural Networks for multi-conditional molecular generation and reaction inference within texts, while the LLM, with enhanced molecular understanding, flexibly controls activation among the different graph modules. Additionally, Llamole integrates A* search with LLM-based cost functions for efficient retrosynthetic planning. We create benchmarking datasets and conduct extensive experiments to evaluate Llamole against in-context learning and supervised fine-tuning. Llamole significantly outperforms 14 adapted LLMs across 12 metrics for controllable molecular design and retrosynthetic planning.

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

TL;DR

Abstract

Paper Structure (42 sections, 6 equations, 11 figures, 4 tables)

This paper contains 42 sections, 6 equations, 11 figures, 4 tables.

Introduction
Preliminaries
Autoregressive Language Modeling
Molecular Design with Graph Diffusion Models
One-Step Reaction Prediction with Graph Neural Networks
Retrosynthetic Planning with A* Search
Llamole: Multimodal Large Language Model for Molecular Discovery
Multimodal Autoregressive Modeling
Llamole Design Space
End-to-End Model Fine-Tuning and Generation
Benchmarking for Multimodal Molecular Design
Experiment
RQ1: LLMs for Controllable and Synthesizable Molecular Design
RQ2: Discussion on Controllable Molecular Generation
Ablation Studies on LLM and Graph DiT Synergy
...and 27 more sections

Figures (11)

Figure 1: Comparison of Controllability: Results are averaged from the best numbers from \ref{['tab:design-performance']}.
Figure 2: Three LLM-based methods for molecular design. The question outlines requirements for properties, structures, and synthesis, addressed as follows: (a) In-Context Learning and (b) Supervised Fine-Tuning use text-only data for demonstrations and instruction tuning, respectively. (c) The proposed Llamole uses graph-text multimodal data to fine-tune the LLM, integrating parameter-frozen graph models for interleaved text and graph generation with reaction inference.
Figure 3: Overview of Llamole: Trigger tokens (<design> and <retro>) switch active modules from the base LLM to the respective graph component. The subsequent <query> token utilizes output vectors from the LLM to summarize past texts as conditions. Using these, Llamole generates molecules and predicts one-step reactions. Enhanced with a graph encoder and A* search, Llamole efficiently plans synthesis routes through selection and expansion iterations on the AND-OR Tree.
Figure 4: Creation of MolQA and MolPair: MolQA comprises two sets: a training set for ICL and (multimodal) SFT, and a test set for evaluation. MolPair consists of graph-text and reaction-text pairs, with red highlights indicating synthetic complexity, structure, and properties information.
Figure 5: Overall Comparison of LLMs for Controllability and Synthesizability: Performance is ranked by averaged BA/MAE (x-axis) and retrosynthesis success rate (y-axis). Circle size indicates model size. LLMs with ICL, SFT, and Llamole are highlighted in blue, orange, and red, respectively.
...and 6 more figures

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

TL;DR

Abstract

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)