Table of Contents
Fetching ...

Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen

TL;DR

FMG addresses the challenge of learning interpretable molecular grammars from limited data by leveraging multi-modal foundation models to align image- and text-based representations of molecules. It reframes grammar induction as a hierarchical, MMFM-guided clique-tree decomposition that is converted into a hyperedge-replacement grammar, enabling robust generation and design of molecules with built-in interpretability. The approach combines step-by-step MMFM reasoning, LLM-based tournaments for rule quality, and stochastic grammar pooling to achieve data-efficient synthesis, higher diversity, and class membership in molecular discovery workflows. Empirical results on small and real-world datasets show FMG outperforms existing grammar-based and ML baselines on synthesizability, diversity, and data efficiency, with strong alignment between expert judgments and LLM evaluations. The work provides a scalable, interpretable foundation for automated molecular design, with code and prompts enabling broader adoption.

Abstract

Recent data-efficient molecular generation approaches exploit graph grammars to introduce interpretability into the generative models. However, grammar learning therein relies on expert annotation or unreliable heuristics for algorithmic inference. We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. By exploiting the chemical knowledge of an MMFM, FMG renders molecules as images, describes them as text, and aligns information across modalities using prompt learning. FMG can be used as a drop-in replacement for the prior grammar learning approaches in molecular generation and property prediction. We show that FMG not only excels in synthesizability, diversity, and data efficiency but also offers built-in chemical interpretability for automated molecular discovery workflows. Code is available at https://github.com/shiningsunnyday/induction.

Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

TL;DR

FMG addresses the challenge of learning interpretable molecular grammars from limited data by leveraging multi-modal foundation models to align image- and text-based representations of molecules. It reframes grammar induction as a hierarchical, MMFM-guided clique-tree decomposition that is converted into a hyperedge-replacement grammar, enabling robust generation and design of molecules with built-in interpretability. The approach combines step-by-step MMFM reasoning, LLM-based tournaments for rule quality, and stochastic grammar pooling to achieve data-efficient synthesis, higher diversity, and class membership in molecular discovery workflows. Empirical results on small and real-world datasets show FMG outperforms existing grammar-based and ML baselines on synthesizability, diversity, and data efficiency, with strong alignment between expert judgments and LLM evaluations. The work provides a scalable, interpretable foundation for automated molecular design, with code and prompts enabling broader adoption.

Abstract

Recent data-efficient molecular generation approaches exploit graph grammars to introduce interpretability into the generative models. However, grammar learning therein relies on expert annotation or unreliable heuristics for algorithmic inference. We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. By exploiting the chemical knowledge of an MMFM, FMG renders molecules as images, describes them as text, and aligns information across modalities using prompt learning. FMG can be used as a drop-in replacement for the prior grammar learning approaches in molecular generation and property prediction. We show that FMG not only excels in synthesizability, diversity, and data efficiency but also offers built-in chemical interpretability for automated molecular discovery workflows. Code is available at https://github.com/shiningsunnyday/induction.

Paper Structure

This paper contains 45 sections, 12 figures, 9 tables.

Figures (12)

  • Figure 1: Main modules of FMG algorithm (left) we initialize base cliques using bonds and minimal rings, (left-middle) we triangulate the clique graph to guarantee existence of a clique tree, (middle) we prompt MMFM to meaningfully merge pairs of motifs, (middle-right) we eliminate cycles in the clique graph by prompting MMFM to identify the least important interactions, (right) we prompt MMFM to select the root motif, completing the tree.
  • Figure 2: Example of conversion from clique tree to HRG production rules; (Left) Each node of the clique tree contains a substructure (red), with edges corresponding to shared bonds between substructures; (Right-top) Rule extracted from second clique of the tree, with a non-terminal hyperedge for the LHS and the clique's substructure being the RHS; (Right-bottom) example of applying the rule, dashed connections are corresponding bonds and atoms
  • Figure 3: Our workflow takes as input a class-specific dataset and a collection of prompts (left); executes the tree decomposition algorithm with MMFM as a decision-making module (left middle); converts the parse tree into production rule set (left-right), resolving discrepancy across runs with a non-expert LLM judge; and infers a grammar which can generate new class-specific samples (right).
  • Figure 4: We vary $k$ from 1-10 for FMG and 2-10 for FMG-Text (see Sec. \ref{['ablation:fmg-text']}). Full FMG results in App. \ref{['app:ensemble']}. Full comparison between FMG and FMG-Text in App. \ref{['app:motivate-images']}.
  • Figure 5: Visualization of results across all 5 evaluation metrics.
  • ...and 7 more figures