MolGround: A Benchmark for Molecular Grounding
Jiaxin Wu, Ting Zhang, Rubing Chen, Wengyu Zhang, Chen Jason Zhang, Xiao-Yong Wei, Li Qing
TL;DR
MolGround introduces a large-scale molecular grounding benchmark to evaluate the referential aspect of molecular understanding, defining five fine-grained grounding tasks and assembling 117k QA pairs. It couples NLP-inspired grounding with cheminformatics through a multi-agent grounding prototype that leverages PubChem, LLMs, and RDKit to connect textual references with structural components. Across extensive experiments, standard pretrained models struggle on fine-grained tasks, while the grounding agent and retrieval/fine-tuning strategies yield improvement and positively impact downstream applications like molecular captioning and ATC classification. The work advances interpretability and enabling referential perception in AI for Science, offering a scalable framework and dataset for future research in molecular grounding.
Abstract
Current molecular understanding approaches predominantly focus on the descriptive aspect of human perception, providing broad, topic-level insights. However, the referential aspect -- linking molecular concepts to specific structural components -- remains largely unexplored. To address this gap, we propose a molecular grounding benchmark designed to evaluate a model's referential abilities. We align molecular grounding with established conventions in NLP, cheminformatics, and molecular science, showcasing the potential of NLP techniques to advance molecular understanding within the AI for Science movement. Furthermore, we constructed the largest molecular understanding benchmark to date, comprising 117k QA pairs, and developed a multi-agent grounding prototype as proof of concept. This system outperforms existing models, including GPT-4o, and its grounding outputs have been integrated to enhance traditional tasks such as molecular captioning and ATC (Anatomical, Therapeutic, Chemical) classification.
