Table of Contents
Fetching ...

RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack

Weichen Dai, Zijie Dai, Zhijie Huang, Yixuan Pan, Xinhe Li, Xi Li, Yi Zhou, Ji Qi, Wu Jiang

TL;DR

This work tackles the underutilization of structured scientific data by LLMs and the challenge of precise numerical reasoning in domain tasks. It introduces RLDBF (Reinforcement Learning with Database Feedback), a framework that integrates structured database information into the LLM training loop via CPT, SFT, and RL, using Direct Preference Optimization to leverage ground-truth numeric values without heavy human labeling. The authors validate RLDBF on chemical data from PubChem and the ChemBench benchmark, showing superior generalization and robustness to perturbations compared to general-purpose and domain-specific baselines, and demonstrating benefits in both property prediction and complex chemical reasoning. By reducing annotation costs and enabling database-driven knowledge absorption, RLDBF signals a scalable path for applying LLMs to AI for Science across diverse structured-data domains.

Abstract

While current large language models (LLMs) demonstrate remarkable linguistic capabilities through training on massive unstructured text corpora, they remain inadequate in leveraging structured scientific data (e.g., chemical molecular properties in databases) that encapsulate centuries of accumulated scientific expertise. These structured datasets hold strategic significance for advancing AI for Science yet current approaches merely treat them as auxiliary supplements to unstructured text. This study pioneers a systematic investigation into enhancing LLMs with structured scientific data, using chemical molecular science as a testbed. We investigate the impact of incorporating molecular property data on LLM across distinct training phases, including continual pre-training, supervised fine-tuning, and reinforcement learning. Notably, to address the inherent limitation of numerical insensitivity in large models, we propose an innovative methodology termed "Reinforcement Learning with Database Feedback" (RLDBF). Experimental evaluations demonstrate the efficacy of the proposed approach, with the model exhibiting remarkable generalization capabilities on previously unseen data and other chemical tasks. The results substantiate the potential of our method in advancing the field of structured scientific data processing within LLMs.

RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack

TL;DR

This work tackles the underutilization of structured scientific data by LLMs and the challenge of precise numerical reasoning in domain tasks. It introduces RLDBF (Reinforcement Learning with Database Feedback), a framework that integrates structured database information into the LLM training loop via CPT, SFT, and RL, using Direct Preference Optimization to leverage ground-truth numeric values without heavy human labeling. The authors validate RLDBF on chemical data from PubChem and the ChemBench benchmark, showing superior generalization and robustness to perturbations compared to general-purpose and domain-specific baselines, and demonstrating benefits in both property prediction and complex chemical reasoning. By reducing annotation costs and enabling database-driven knowledge absorption, RLDBF signals a scalable path for applying LLMs to AI for Science across diverse structured-data domains.

Abstract

While current large language models (LLMs) demonstrate remarkable linguistic capabilities through training on massive unstructured text corpora, they remain inadequate in leveraging structured scientific data (e.g., chemical molecular properties in databases) that encapsulate centuries of accumulated scientific expertise. These structured datasets hold strategic significance for advancing AI for Science yet current approaches merely treat them as auxiliary supplements to unstructured text. This study pioneers a systematic investigation into enhancing LLMs with structured scientific data, using chemical molecular science as a testbed. We investigate the impact of incorporating molecular property data on LLM across distinct training phases, including continual pre-training, supervised fine-tuning, and reinforcement learning. Notably, to address the inherent limitation of numerical insensitivity in large models, we propose an innovative methodology termed "Reinforcement Learning with Database Feedback" (RLDBF). Experimental evaluations demonstrate the efficacy of the proposed approach, with the model exhibiting remarkable generalization capabilities on previously unseen data and other chemical tasks. The results substantiate the potential of our method in advancing the field of structured scientific data processing within LLMs.

Paper Structure

This paper contains 31 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the working pipeline.
  • Figure 2: Overview of RLDBF.
  • Figure 3: Examples for CPT & SFT data.
  • Figure 4: Examples for RL data.