Table of Contents
Fetching ...

Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance

Lingfei Qian, Weipeng Zhou, Yan Wang, Xueqing Peng, Han Yi, Yilun Zhao, Jimin Huang, Qianqian Xie, Jian-yun Nie

TL;DR

This work tackles the unique challenges of financial reasoning in LLMs by introducing FinCoT, a high-fidelity CoT dataset built through domain-guided, iterative refinement and difficulty-aware filtering; Fin-o1, open-source financial reasoning models trained with a two-stage SFT+RL framework, and FinReason, a comprehensive benchmark for multi-table, long-context, and equation-based tasks. It provides an empirical study comparing PPO, DPO, and GRPO in the financial domain, demonstrating that GRPO yields reliable gains while pure scale or general reasoning approaches struggle to adapt to finance. The results show Fin-o1, trained on FinCoT, outperforms larger general models and existing financial reasoning baselines, highlighting the importance of domain-specific data and optimization strategies. The work offers practical insights for developing robust financial reasoning systems and points to future directions in domain adaptation, multi-table reasoning, and long-context processing.

Abstract

As the fundamental capability behind decision-making in finance, financial reasoning poses distinct challenges for LLMs. Although reinforcement learning (RL) have boosted generic reasoning, the progress in finance is hindered by the absence of empirical study of building effective financial chain-of-thought (CoT) corpus, a systematic comparison of different RL methods, and comprehensive benchmarks. To address these gaps, we introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets by a novel three-stage pipeline that incorporates domain supervision, iterative LLM refinement, and difficulty-aware filtering. Based on FinCoT, we develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL. Our models outperform existing financial reasoning models and SOTA general models such as GPT-o1, DeepSeek-R1, and GPT-4.5. We also investigate the effectiveness of three different RL methods in improving domain-specific reasoning, offering the first such empirical study. We finally propose FinReason, the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks, and evaluate 29 LLMs. Our extensive experiments reveal general reasoning models excel on standard benchmarks yet exhibit obvious performance degradation in financial contexts; even finance-tuned models like Dianjin-R1 and FinR1 degrade on lengthy documents. In contrast, our Fin-o1 models consistently outperform their backbones and larger GPT-o1 and DeepSeek-R1, confirming the effectiveness of our data building and model training strategy. Our study further shows that GRPO yields reliable gains whereas PPO and DPO do not, highlighting the need for targeted data and optimisation rather than scale alone.

Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance

TL;DR

This work tackles the unique challenges of financial reasoning in LLMs by introducing FinCoT, a high-fidelity CoT dataset built through domain-guided, iterative refinement and difficulty-aware filtering; Fin-o1, open-source financial reasoning models trained with a two-stage SFT+RL framework, and FinReason, a comprehensive benchmark for multi-table, long-context, and equation-based tasks. It provides an empirical study comparing PPO, DPO, and GRPO in the financial domain, demonstrating that GRPO yields reliable gains while pure scale or general reasoning approaches struggle to adapt to finance. The results show Fin-o1, trained on FinCoT, outperforms larger general models and existing financial reasoning baselines, highlighting the importance of domain-specific data and optimization strategies. The work offers practical insights for developing robust financial reasoning systems and points to future directions in domain adaptation, multi-table reasoning, and long-context processing.

Abstract

As the fundamental capability behind decision-making in finance, financial reasoning poses distinct challenges for LLMs. Although reinforcement learning (RL) have boosted generic reasoning, the progress in finance is hindered by the absence of empirical study of building effective financial chain-of-thought (CoT) corpus, a systematic comparison of different RL methods, and comprehensive benchmarks. To address these gaps, we introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets by a novel three-stage pipeline that incorporates domain supervision, iterative LLM refinement, and difficulty-aware filtering. Based on FinCoT, we develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL. Our models outperform existing financial reasoning models and SOTA general models such as GPT-o1, DeepSeek-R1, and GPT-4.5. We also investigate the effectiveness of three different RL methods in improving domain-specific reasoning, offering the first such empirical study. We finally propose FinReason, the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks, and evaluate 29 LLMs. Our extensive experiments reveal general reasoning models excel on standard benchmarks yet exhibit obvious performance degradation in financial contexts; even finance-tuned models like Dianjin-R1 and FinR1 degrade on lengthy documents. In contrast, our Fin-o1 models consistently outperform their backbones and larger GPT-o1 and DeepSeek-R1, confirming the effectiveness of our data building and model training strategy. Our study further shows that GRPO yields reliable gains whereas PPO and DPO do not, highlighting the need for targeted data and optimisation rather than scale alone.

Paper Structure

This paper contains 32 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overall framework of development of FinCoT, Fin-o1 and FinReason.
  • Figure 2: Workflow of curating the combined question and examples of program progress.
  • Figure 3: Example of iterative refinement of the CoT process with a guidance.
  • Figure 4: Error case 1.
  • Figure 5: Error case 2.