Table of Contents
Fetching ...

MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs

Guojiang Zhao, Zixiang Lu, Yutang Ge, Sihang Li, Zheng Cheng, Haitao Lin, Lirong Wu, Hanchen Xia, Hengxing Cai, Wentao Guo, Hongshuai Wang, Mingjun Xu, Siyu Zhu, Guolin Ke, Linfeng Zhang, Zhifeng Gao

TL;DR

MolReasoner tackles the challenge of molecular reasoning in LLMs by introducing a two-stage framework that first boots reasoning with knowledge-guided CoT data (Mol-SFT) and then calibrates outputs via multi-dimensional reinforcement learning (Mol-RL) using GRPO. The approach leverages SELFIES for chemical validity and a composite reward to reduce hallucinations while aligning language outputs with molecular structure, achieving state-of-the-art results on molecule captioning and text-based de novo molecule generation, with demonstrated generalization to out-of-distribution data. The methodology yields more interpretable reasoning chains and robust, chemically coherent outputs, addressing both fidelity and interpretability. Limitations include potential biases from synthetic CoT, omission of some chemical properties, and high computational cost, guiding future work toward efficiency and broader property considerations.

Abstract

Large Language Models (LLMs) have shown impressive performance across various domains, but their ability to perform molecular reasoning remains underexplored. Existing methods mostly rely on general-purpose prompting, which lacks domain-specific molecular semantics, or fine-tuning, which faces challenges in interpretability and reasoning depth, often leading to structural and textual hallucinations. To address these issues, we introduce MolReasoner, a two-stage framework that transitions LLMs from memorization to high-fidelity chemical reasoning. In the Mol-SFT stage, knowledge-enhanced Chain-of-Thought (CoT) data provides a strong foundation, while the Mol-RL stage refines reasoning using a novel, task-adaptive reward system to mitigate hallucinations. Extensive evaluations demonstrate that MolReasoner significantly outperforms a wide range of strong baselines in both molecule generation and captioning tasks. Further analyses highlight the framework's synergistic design and its ability to produce more interpretable outputs. Our work presents a principled and effective new approach for advancing high-fidelity molecular reasoning.

MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs

TL;DR

MolReasoner tackles the challenge of molecular reasoning in LLMs by introducing a two-stage framework that first boots reasoning with knowledge-guided CoT data (Mol-SFT) and then calibrates outputs via multi-dimensional reinforcement learning (Mol-RL) using GRPO. The approach leverages SELFIES for chemical validity and a composite reward to reduce hallucinations while aligning language outputs with molecular structure, achieving state-of-the-art results on molecule captioning and text-based de novo molecule generation, with demonstrated generalization to out-of-distribution data. The methodology yields more interpretable reasoning chains and robust, chemically coherent outputs, addressing both fidelity and interpretability. Limitations include potential biases from synthetic CoT, omission of some chemical properties, and high computational cost, guiding future work toward efficiency and broader property considerations.

Abstract

Large Language Models (LLMs) have shown impressive performance across various domains, but their ability to perform molecular reasoning remains underexplored. Existing methods mostly rely on general-purpose prompting, which lacks domain-specific molecular semantics, or fine-tuning, which faces challenges in interpretability and reasoning depth, often leading to structural and textual hallucinations. To address these issues, we introduce MolReasoner, a two-stage framework that transitions LLMs from memorization to high-fidelity chemical reasoning. In the Mol-SFT stage, knowledge-enhanced Chain-of-Thought (CoT) data provides a strong foundation, while the Mol-RL stage refines reasoning using a novel, task-adaptive reward system to mitigate hallucinations. Extensive evaluations demonstrate that MolReasoner significantly outperforms a wide range of strong baselines in both molecule generation and captioning tasks. Further analyses highlight the framework's synergistic design and its ability to produce more interpretable outputs. Our work presents a principled and effective new approach for advancing high-fidelity molecular reasoning.

Paper Structure

This paper contains 38 sections, 10 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Examples of text-based molecule generation. (a) Prompt-based methods often hallucinate and yield chemically invalid molecules due to a lack of chemistry-specific adaptation. (b) Fine-tuning without explicit reasoning encourages memorization over generalization, reducing interpretability. (c) MolReasoner provides structure-grounded Chain-of-Thought reasoning, yielding interpretable and chemically valid candidates.
  • Figure 2: MolReasoner is a two-stage training framework: (1) Mol-SFT initially utilizes molecule–text pairs, augmented by reasoning trajectories generated via GPT-4o, to bootstrap reasoning capabilities; and (2) Mol-RL subsequently refines the reasoning ability through a carefully designed reward function that encourages precise alignment between molecular structures and their corresponding textual descriptions.
  • Figure 3: Performance of all models across five key evaluation metrics in the Text-based de novo Molecule Generation. To provide a more intuitive comparison, all scores are normalized by dividing them by the scores of MolReasoner.
  • Figure 4: Individual Reward Ablation For Molecule Captioning.
  • Figure 5: Impact of Knowledge-Guided CoT Data.
  • ...and 7 more figures