Table of Contents
Fetching ...

MolGround: A Benchmark for Molecular Grounding

Jiaxin Wu, Ting Zhang, Rubing Chen, Wengyu Zhang, Chen Jason Zhang, Xiao-Yong Wei, Li Qing

TL;DR

MolGround introduces a large-scale molecular grounding benchmark to evaluate the referential aspect of molecular understanding, defining five fine-grained grounding tasks and assembling 117k QA pairs. It couples NLP-inspired grounding with cheminformatics through a multi-agent grounding prototype that leverages PubChem, LLMs, and RDKit to connect textual references with structural components. Across extensive experiments, standard pretrained models struggle on fine-grained tasks, while the grounding agent and retrieval/fine-tuning strategies yield improvement and positively impact downstream applications like molecular captioning and ATC classification. The work advances interpretability and enabling referential perception in AI for Science, offering a scalable framework and dataset for future research in molecular grounding.

Abstract

Current molecular understanding approaches predominantly focus on the descriptive aspect of human perception, providing broad, topic-level insights. However, the referential aspect -- linking molecular concepts to specific structural components -- remains largely unexplored. To address this gap, we propose a molecular grounding benchmark designed to evaluate a model's referential abilities. We align molecular grounding with established conventions in NLP, cheminformatics, and molecular science, showcasing the potential of NLP techniques to advance molecular understanding within the AI for Science movement. Furthermore, we constructed the largest molecular understanding benchmark to date, comprising 117k QA pairs, and developed a multi-agent grounding prototype as proof of concept. This system outperforms existing models, including GPT-4o, and its grounding outputs have been integrated to enhance traditional tasks such as molecular captioning and ATC (Anatomical, Therapeutic, Chemical) classification.

MolGround: A Benchmark for Molecular Grounding

TL;DR

MolGround introduces a large-scale molecular grounding benchmark to evaluate the referential aspect of molecular understanding, defining five fine-grained grounding tasks and assembling 117k QA pairs. It couples NLP-inspired grounding with cheminformatics through a multi-agent grounding prototype that leverages PubChem, LLMs, and RDKit to connect textual references with structural components. Across extensive experiments, standard pretrained models struggle on fine-grained tasks, while the grounding agent and retrieval/fine-tuning strategies yield improvement and positively impact downstream applications like molecular captioning and ATC classification. The work advances interpretability and enabling referential perception in AI for Science, offering a scalable framework and dataset for future research in molecular grounding.

Abstract

Current molecular understanding approaches predominantly focus on the descriptive aspect of human perception, providing broad, topic-level insights. However, the referential aspect -- linking molecular concepts to specific structural components -- remains largely unexplored. To address this gap, we propose a molecular grounding benchmark designed to evaluate a model's referential abilities. We align molecular grounding with established conventions in NLP, cheminformatics, and molecular science, showcasing the potential of NLP techniques to advance molecular understanding within the AI for Science movement. Furthermore, we constructed the largest molecular understanding benchmark to date, comprising 117k QA pairs, and developed a multi-agent grounding prototype as proof of concept. This system outperforms existing models, including GPT-4o, and its grounding outputs have been integrated to enhance traditional tasks such as molecular captioning and ATC (Anatomical, Therapeutic, Chemical) classification.

Paper Structure

This paper contains 18 sections, 7 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: A referential framework for fine-grained molecular grounding, comprising five tasks: Chemical Name Entity Recognition, Name-Structure Mapping, Referential Substructure Localization, Substructure Relationship Grounding, and Substructure Frequency Analysis, demonstrated through a running example.
  • Figure 2: Diversity in naming conventions and multimodal gaps between the textual and structural forms.
  • Figure 3: Multiple instances of thiophene rings, varying by rotations, present a challenge in identifying a generalizable feature for localization. Additionally, selenophene rings, differing by only one atom from thiophene, may further complicate localization.
  • Figure 4: The relationship of "functional integration" between the thiophene (yellow) and selenophene (blue) rings varies significantly across different molecules (attached in one case but distinctly separate in another).
  • Figure 5: Comparison of the ground truth and grounding outputs by GPT4o and the grounding agent.
  • ...and 2 more figures