Table of Contents
Fetching ...

DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

Jie Zhu, Qian Chen, Huaixia Dou, Junhui Li, Lifan Guo, Feng Chen, Chi Zhang

TL;DR

<3-5 sentence high-level summary> Financial reasoning remains challenging for large language models due to domain-specific knowledge, numerical precision, and regulatory constraints. The authors introduce DianJin-R1, a reasoning-augmented framework that combines structured supervision with Group Relative Policy Optimization (GRPO) to train models that generate explicit reasoning paths and accurate answers, using data from CFLUE, FinQA, and CCC. Across financial benchmarks CFLUE, FinQA, and CCC, and general reasoning tasks MATH-500 and GPQA-Diamond, DianJin-R1 consistently outperforms non-reasoning baselines, with the 32B variant achieving top performance and single-call efficiency on CCC. The work demonstrates a scalable approach for real-world financial reasoning and compliance assessment, with implications for interpretable decision support and potential tool-augmented extensions in future work.

Abstract

Effective reasoning remains a core challenge for large language models (LLMs) in the financial domain, where tasks often require domain-specific knowledge, precise numerical calculations, and strict adherence to compliance rules. We propose DianJin-R1, a reasoning-enhanced framework designed to address these challenges through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that generates both reasoning steps and final answers. To further refine reasoning quality, we apply Group Relative Policy Optimization (GRPO), a reinforcement learning method that incorporates dual reward signals: one encouraging structured outputs and another rewarding answer correctness. We evaluate our models on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and two general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental results show that DianJin-R1 models consistently outperform their non-reasoning counterparts, especially on complex financial tasks. Moreover, on the real-world CCC dataset, our single-call reasoning models match or even surpass the performance of multi-agent systems that require significantly more computational cost. These findings demonstrate the effectiveness of DianJin-R1 in enhancing financial reasoning through structured supervision and reward-aligned learning, offering a scalable and practical solution for real-world applications.

DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

TL;DR

<3-5 sentence high-level summary> Financial reasoning remains challenging for large language models due to domain-specific knowledge, numerical precision, and regulatory constraints. The authors introduce DianJin-R1, a reasoning-augmented framework that combines structured supervision with Group Relative Policy Optimization (GRPO) to train models that generate explicit reasoning paths and accurate answers, using data from CFLUE, FinQA, and CCC. Across financial benchmarks CFLUE, FinQA, and CCC, and general reasoning tasks MATH-500 and GPQA-Diamond, DianJin-R1 consistently outperforms non-reasoning baselines, with the 32B variant achieving top performance and single-call efficiency on CCC. The work demonstrates a scalable approach for real-world financial reasoning and compliance assessment, with implications for interpretable decision support and potential tool-augmented extensions in future work.

Abstract

Effective reasoning remains a core challenge for large language models (LLMs) in the financial domain, where tasks often require domain-specific knowledge, precise numerical calculations, and strict adherence to compliance rules. We propose DianJin-R1, a reasoning-enhanced framework designed to address these challenges through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that generates both reasoning steps and final answers. To further refine reasoning quality, we apply Group Relative Policy Optimization (GRPO), a reinforcement learning method that incorporates dual reward signals: one encouraging structured outputs and another rewarding answer correctness. We evaluate our models on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and two general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental results show that DianJin-R1 models consistently outperform their non-reasoning counterparts, especially on complex financial tasks. Moreover, on the real-world CCC dataset, our single-call reasoning models match or even surpass the performance of multi-agent systems that require significantly more computational cost. These findings demonstrate the effectiveness of DianJin-R1 in enhancing financial reasoning through structured supervision and reward-aligned learning, offering a scalable and practical solution for real-world applications.

Paper Structure

This paper contains 29 sections, 1 equation, 9 figures, 6 tables.

Figures (9)

  • Figure 1: An example of reasoning data synthesized by a multi-agent system.
  • Figure 2: Illustration of two-step training for DianJin-R1.
  • Figure 3: An example of converting a multiple-choice question from CFLUE into an open-ended format.
  • Figure 4: Prompt used to convert a multiple-choice question from CFLUE into an open-ended question.
  • Figure 5: Prompt used to generate answers for single-answer multiple-choice questions in CFLUE.
  • ...and 4 more figures