Table of Contents
Fetching ...

Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO

Ran Li, Shimin Di, Yuchen Liu, Chen Jing, Yu Qiu, Lei Chen

TL;DR

This work investigates how post-training strategies influence reasoning and memorization in Scientific Information Extraction (SciIE). It introduces MimicSFT, a structured reasoning approach that does not require high-quality chain-of-thought data, and R2GRPO, a two-stage reinforcement learning method with a composite reward that emphasizes relevance and rule-induction. Experiments on SciER and out-of-domain data show that R2GRPO, especially when paired with MimicSFT, yields state-of-the-art performance among reasoning models and competitive results with supervised models, particularly for relation extraction. The results support the claim that combining memory-based fine-tuning with hierarchical, constrained reasoning enhances both knowledge integration and robust generalization in SciIE, with practical implications for building more capable IE systems.

Abstract

Previous study suggest that powerful Large Language Models (LLMs) trained with Reinforcement Learning with Verifiable Rewards (RLVR) only refines reasoning path without improving the reasoning capacity in math tasks while supervised-finetuning(SFT) with distillation can. We study this from the view of Scientific information extraction (SciIE) where LLMs and reasoning LLMs underperforms small Bert-based models. SciIE require both the reasoning and memorization. We argue that both SFT and RLVR can refine the reasoning path and improve reasoning capacity in a simple way based on SciIE. We propose two-stage training with 1. MimicSFT, using structured reasoning templates without needing high-quality chain-of-thought data, 2. R$^2$GRPO with relevance and rule-induced rewards. Experiments on scientific IE benchmarks show that both methods can improve the reasoning capacity. R$^2$GRPO with mimicSFT surpasses baseline LLMs and specialized supervised models in relation extraction. Our code is available at https://github.com/ranlislz/R2GRPO.

Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO

TL;DR

This work investigates how post-training strategies influence reasoning and memorization in Scientific Information Extraction (SciIE). It introduces MimicSFT, a structured reasoning approach that does not require high-quality chain-of-thought data, and R2GRPO, a two-stage reinforcement learning method with a composite reward that emphasizes relevance and rule-induction. Experiments on SciER and out-of-domain data show that R2GRPO, especially when paired with MimicSFT, yields state-of-the-art performance among reasoning models and competitive results with supervised models, particularly for relation extraction. The results support the claim that combining memory-based fine-tuning with hierarchical, constrained reasoning enhances both knowledge integration and robust generalization in SciIE, with practical implications for building more capable IE systems.

Abstract

Previous study suggest that powerful Large Language Models (LLMs) trained with Reinforcement Learning with Verifiable Rewards (RLVR) only refines reasoning path without improving the reasoning capacity in math tasks while supervised-finetuning(SFT) with distillation can. We study this from the view of Scientific information extraction (SciIE) where LLMs and reasoning LLMs underperforms small Bert-based models. SciIE require both the reasoning and memorization. We argue that both SFT and RLVR can refine the reasoning path and improve reasoning capacity in a simple way based on SciIE. We propose two-stage training with 1. MimicSFT, using structured reasoning templates without needing high-quality chain-of-thought data, 2. RGRPO with relevance and rule-induced rewards. Experiments on scientific IE benchmarks show that both methods can improve the reasoning capacity. RGRPO with mimicSFT surpasses baseline LLMs and specialized supervised models in relation extraction. Our code is available at https://github.com/ranlislz/R2GRPO.

Paper Structure

This paper contains 22 sections, 11 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Our two-stage training for scientific IE (right) and the performance gain (left)
  • Figure 2: Best F1@K scores representing the reasoning capacity and Avg@K scores representing the reasoning ability for NER and RE on SciER (small).
  • Figure 3: Best F1@K scores representing the reasoning capacity and Avg@K scores representing the reasoning ability for NER and RE on OOD (small).
  • Figure 4: Performance v.s temperature
  • Figure 5: Response length(a) and Reward(b) v.s. training steps for R$^2$GRPO
  • ...and 2 more figures