Structural Reasoning Improves Molecular Understanding of LLM
Yunhui Jang, Jaehyung Kim, Sungsoo Ahn
TL;DR
This work identifies a persistent gap in LLMs’ ability to reason about molecular structure and demonstrates that explicit structural reasoning is essential for accurate molecular understanding. It introduces Molecular Structural Reasoning (MSR), a two-stage framework with a reasoning module and an answering module that handles analytic and synthetic scenarios, leveraging external tools like RDKit for deterministic structure extraction. Across molecule-to-text, retrosynthesis, and text-to-molecule tasks, MSR yields consistent improvements for both chemical and general LLMs, with notable gains in description quality, synthesis accuracy, and generation fidelity, including state-of-the-art performance in several settings. The results underscore the value of explicit, componentized structural reasoning in domain-specific LLMs, while also highlighting ablations and limitations—such as partial difficulties with certain structural elements and descriptor interactions—that guide future improvements and ensure reproducibility.
Abstract
Recently, large language models (LLMs) have shown significant progress, approaching human perception levels. In this work, we demonstrate that despite these advances, LLMs still struggle to reason using molecular structural information. This gap is critical because many molecular properties, including functional groups, depend heavily on such structural details. To address this limitation, we propose an approach that sketches molecular structures for reasoning. Specifically, we introduce Molecular Structural Reasoning (MSR) framework to enhance the understanding of LLMs by explicitly incorporating the key structural features. We present two frameworks for scenarios where the target molecule is known or unknown. We verify that our MSR improves molecular understanding through extensive experiments.
