Table of Contents
Fetching ...

One More Question is Enough, Expert Question Decomposition (EQD) Model for Domain Quantitative Reasoning

Mengyu Wang, Sotirios Sabanis, Miguel de Carvalho, Shay B. Cohen, Tiejun Ma

TL;DR

This work tackles domain-specific quantitative reasoning by introducing Expert Question Decomposition (EQD), a two-step fine-tuning framework that combines domain adaptation with QA alignment. It first fine-tunes a lightweight model on a financial dialogue corpus to learn concise question decomposition, then optimizes the decomposer with PPO using an answer-comparison reward: $r = c(a_{qd}) \cdot (1 + 0.5 \cdot |c(a_{di}) - c(a_{qd})|)$, where $c(a)$ indicates correctness and the four discrete rewards are $+2$, $+1$, $-1$, and $-2$. The approach achieves QA improvements of $0.6\%$ to $10.5\%$ across four financial benchmarks and multiple LLMs, while maintaining inference efficiency close to zero-shot prompting. A key finding is that a single concise supporting question can outperform detailed reasoning steps, offering practical efficiency and robustness advantages for domain QA. The method relies on a small decomposition dataset and a single GPU, and demonstrates strong generalization across model sizes, suggesting potential applicability to other specialized domains beyond finance.

Abstract

Domain-specific quantitative reasoning remains a major challenge for large language models (LLMs), especially in fields requiring expert knowledge and complex question answering (QA). In this work, we propose Expert Question Decomposition (EQD), an approach designed to balance the use of domain knowledge with computational efficiency. EQD is built on a two-step fine-tuning framework and guided by a reward function that measures the effectiveness of generated sub-questions in improving QA outcomes. It requires only a few thousand training examples and a single A100 GPU for fine-tuning, with inference time comparable to zero-shot prompting. Beyond its efficiency, EQD outperforms state-of-the-art domain-tuned models and advanced prompting strategies. We evaluate EQD in the financial domain, characterized by specialized knowledge and complex quantitative reasoning, across four benchmark datasets. Our method consistently improves QA performance by 0.6% to 10.5% across different LLMs. Our analysis reveals an important insight: in domain-specific QA, a single supporting question often provides greater benefit than detailed guidance steps.

One More Question is Enough, Expert Question Decomposition (EQD) Model for Domain Quantitative Reasoning

TL;DR

This work tackles domain-specific quantitative reasoning by introducing Expert Question Decomposition (EQD), a two-step fine-tuning framework that combines domain adaptation with QA alignment. It first fine-tunes a lightweight model on a financial dialogue corpus to learn concise question decomposition, then optimizes the decomposer with PPO using an answer-comparison reward: , where indicates correctness and the four discrete rewards are , , , and . The approach achieves QA improvements of to across four financial benchmarks and multiple LLMs, while maintaining inference efficiency close to zero-shot prompting. A key finding is that a single concise supporting question can outperform detailed reasoning steps, offering practical efficiency and robustness advantages for domain QA. The method relies on a small decomposition dataset and a single GPU, and demonstrates strong generalization across model sizes, suggesting potential applicability to other specialized domains beyond finance.

Abstract

Domain-specific quantitative reasoning remains a major challenge for large language models (LLMs), especially in fields requiring expert knowledge and complex question answering (QA). In this work, we propose Expert Question Decomposition (EQD), an approach designed to balance the use of domain knowledge with computational efficiency. EQD is built on a two-step fine-tuning framework and guided by a reward function that measures the effectiveness of generated sub-questions in improving QA outcomes. It requires only a few thousand training examples and a single A100 GPU for fine-tuning, with inference time comparable to zero-shot prompting. Beyond its efficiency, EQD outperforms state-of-the-art domain-tuned models and advanced prompting strategies. We evaluate EQD in the financial domain, characterized by specialized knowledge and complex quantitative reasoning, across four benchmark datasets. Our method consistently improves QA performance by 0.6% to 10.5% across different LLMs. Our analysis reveals an important insight: in domain-specific QA, a single supporting question often provides greater benefit than detailed guidance steps.

Paper Structure

This paper contains 30 sections, 1 equation, 4 figures, 7 tables.

Figures (4)

  • Figure 1: A practical example comparing different QA processes. General LLMs struggle to give correct answers directly. The CoT method attempts to simplify the question, but often decompose the query into overly detailed steps, introducing confusion. In contrast, our EQD model adds a single sub-question that effectively guides the LLM toward the correct answer.
  • Figure 2: Two-step training framework of the Expert Question Decomposition model.
  • Figure 3: Comparison of QD models fine-tuned differently, using Llama3.1 as QA model. Blue bars reflect QA accuracy (left y-axis), while red bars indicate the average word count of generated questions (right y-axis).
  • Figure 4: Comparison of inference time consumption and input length across methods. Blue bars represent time consuming (left y-axis), while red bars indicate the average word count of extra inputs (right y-axis).