Table of Contents
Fetching ...

AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models

Xuan Lin, Long Chen, Yile Wang

TL;DR

AttriLens-Mol presents an attribute-guided reinforcement learning framework for molecular property prediction with LLMs. By combining a structured attribute template with format, count, and rationality rewards, the method elicits relevant molecular attributes during reasoning and validates them with RDKit and external LLMs. Using GRPO or DAPO, the approach achieves competitive or superior results on in-distribution and out-of-distribution tasks with 7B backbones, and yields improved interpretability via decision-tree models built on extracted attributes. The work demonstrates that enforcing attribute-centric reasoning improves both accuracy and transparency, and provides code for reproducibility. This advances practical, scalable molecular property prediction with lightweight LLMs.

Abstract

Large Language Models (LLMs) have shown promise in assisting molecular property prediction tasks but often rely on human-crafted prompts and chain-of-thought templates. While recent advanced large reasoning models like DeepSeek-R1 employ reinforcement learning for an extended ``thinking'' process, their reasoning can be verbose and lack relevance. We introduce AttriLens-Mol, an attribute-guided reinforcement learning framework for molecular property prediction with LLMs. AttriLens-Mol steers the model's reasoning by using: (1) a format reward encouraging attribute-based structured output, (2) a count reward to avoid enumerating irrelevant attributes, and (3) a rationality reward using advanced LLMs and RDKit to verify the relatedness of the generated attributes. This approach implicitly elicits the model's inherent knowledge of relevant molecular attributes during reasoning, enables making predictions for the molecular property more effectively. Experiments on both in-distribution and out-of-distribution datasets show that, training both 7B-size R1-Distilled-Qwen2.5 and R1-Distilled-LLaMA3.1 models on 4,000 samples with our proposed AttriLens-Mol method significantly boosts the performance, getting comparable or better results than supervised fine-tuning models (Mol-Instructions, ChemDFM, etc.) and advanced models (GPT-3.5, GPT-4o, DeepSeek-V3, DeepSeek-R1, etc.). Further, our extracted attributes for the target property, when used as features for an interpretable decision tree model, yield superior performance compared to attributes generated by prompting LLMs. This shows that AttriLens-Mol effectively elicits more relevant and predictive molecular attributes, leading to enhanced interpretability and performance for property prediction. We release the code in https://github.com/szu-tera/AttriLens-Mol.

AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models

TL;DR

AttriLens-Mol presents an attribute-guided reinforcement learning framework for molecular property prediction with LLMs. By combining a structured attribute template with format, count, and rationality rewards, the method elicits relevant molecular attributes during reasoning and validates them with RDKit and external LLMs. Using GRPO or DAPO, the approach achieves competitive or superior results on in-distribution and out-of-distribution tasks with 7B backbones, and yields improved interpretability via decision-tree models built on extracted attributes. The work demonstrates that enforcing attribute-centric reasoning improves both accuracy and transparency, and provides code for reproducibility. This advances practical, scalable molecular property prediction with lightweight LLMs.

Abstract

Large Language Models (LLMs) have shown promise in assisting molecular property prediction tasks but often rely on human-crafted prompts and chain-of-thought templates. While recent advanced large reasoning models like DeepSeek-R1 employ reinforcement learning for an extended ``thinking'' process, their reasoning can be verbose and lack relevance. We introduce AttriLens-Mol, an attribute-guided reinforcement learning framework for molecular property prediction with LLMs. AttriLens-Mol steers the model's reasoning by using: (1) a format reward encouraging attribute-based structured output, (2) a count reward to avoid enumerating irrelevant attributes, and (3) a rationality reward using advanced LLMs and RDKit to verify the relatedness of the generated attributes. This approach implicitly elicits the model's inherent knowledge of relevant molecular attributes during reasoning, enables making predictions for the molecular property more effectively. Experiments on both in-distribution and out-of-distribution datasets show that, training both 7B-size R1-Distilled-Qwen2.5 and R1-Distilled-LLaMA3.1 models on 4,000 samples with our proposed AttriLens-Mol method significantly boosts the performance, getting comparable or better results than supervised fine-tuning models (Mol-Instructions, ChemDFM, etc.) and advanced models (GPT-3.5, GPT-4o, DeepSeek-V3, DeepSeek-R1, etc.). Further, our extracted attributes for the target property, when used as features for an interpretable decision tree model, yield superior performance compared to attributes generated by prompting LLMs. This shows that AttriLens-Mol effectively elicits more relevant and predictive molecular attributes, leading to enhanced interpretability and performance for property prediction. We release the code in https://github.com/szu-tera/AttriLens-Mol.

Paper Structure

This paper contains 24 sections, 7 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Examples for comparing different methods. (a) Task specific model generates probabilities of labels for given molecule. (b) The verbal input for language models. (c) Incorrect response by LLaMA-3.1. (d) Correct response by our 7B model with attribute guided reinforcement learning, which offers useful attribute information during reasoning and aids property prediction in the final.
  • Figure 2: Illustration of our AttriLens-Mol reinforcement learning. $\mathcal{R}^{\text{format}}$ and $\mathcal{R}^{\text{correct} }$ are rule-based general rewards (Section \ref{['dsreward']}), $\mathcal{R}^{\text{count}}$ and $\mathcal{R}^{\text{rational}}$ are our attribute-based rewards for molecular property prediction (Section \ref{['ourreward']}), respectively.
  • Figure 3: Illustration of calculating the attribute rationality $\mathcal{R}$$^{\text{rational}}$ using external advanced LLMs and cheminformatics toolkit for each single attribute value alignment.
  • Figure 4: The curves of different rewards during GRPO and DAPO training.
  • Figure 5: Accuracy and tokens used after tokenization for baselines and ours (in red). Q: Qwen2.5. L: LLaMA3.1. G: GRPO. D: DAPO. CoT: with chain-of-thought prompting.
  • ...and 5 more figures