Table of Contents
Fetching ...

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, William Yang Wang

TL;DR

ConvFinQA introduces a finance-focused conversational QA benchmark that targets complex, long-range numerical reasoning over real financial reports. It couples a two-step dataset construction pipeline (flow-simulated reasoning and expert question composition) with a domain-specific DSL for reasoning programs, enabling rigorous evaluation of neural-symbolic versus prompting-based approaches. Across FinQANet and GPT-3 experiments, neural-symbolic models anchored by the FinQA DSL outperform prompting methods, yet all fall short of human experts, highlighting gaps in domain knowledge and long-horizon reasoning in current systems. The work provides a valuable resource and methodology for advancing real-world, complex numerical reasoning in finance, with implications for building more capable and trustworthy financial analysis agents.

Abstract

With the recent advance in large pre-trained language models, researchers have achieved record performances in NLP tasks that mostly focus on language pattern matching. The community is experiencing the shift of the challenge from how to model language to the imitation of complex reasoning abilities like human beings. In this work, we investigate the application domain of finance that involves real-world, complex numerical reasoning. We propose a new large-scale dataset, ConvFinQA, aiming to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations. We conduct comprehensive experiments and analyses with both the neural symbolic methods and the prompting-based methods, to provide insights into the reasoning mechanisms of these two divisions. We believe our new dataset should serve as a valuable resource to push forward the exploration of real-world, complex reasoning tasks as the next research focus. Our dataset and code is publicly available at https://github.com/czyssrs/ConvFinQA.

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

TL;DR

ConvFinQA introduces a finance-focused conversational QA benchmark that targets complex, long-range numerical reasoning over real financial reports. It couples a two-step dataset construction pipeline (flow-simulated reasoning and expert question composition) with a domain-specific DSL for reasoning programs, enabling rigorous evaluation of neural-symbolic versus prompting-based approaches. Across FinQANet and GPT-3 experiments, neural-symbolic models anchored by the FinQA DSL outperform prompting methods, yet all fall short of human experts, highlighting gaps in domain knowledge and long-horizon reasoning in current systems. The work provides a valuable resource and methodology for advancing real-world, complex numerical reasoning in finance, with implications for building more capable and trustworthy financial analysis agents.

Abstract

With the recent advance in large pre-trained language models, researchers have achieved record performances in NLP tasks that mostly focus on language pattern matching. The community is experiencing the shift of the challenge from how to model language to the imitation of complex reasoning abilities like human beings. In this work, we investigate the application domain of finance that involves real-world, complex numerical reasoning. We propose a new large-scale dataset, ConvFinQA, aiming to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations. We conduct comprehensive experiments and analyses with both the neural symbolic methods and the prompting-based methods, to provide insights into the reasoning mechanisms of these two divisions. We believe our new dataset should serve as a valuable resource to push forward the exploration of real-world, complex reasoning tasks as the next research focus. Our dataset and code is publicly available at https://github.com/czyssrs/ConvFinQA.
Paper Structure (37 sections, 2 equations, 9 figures, 9 tables)

This paper contains 37 sections, 2 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: An example from ConvFinQA: each question may depend on previous questions to answer.
  • Figure 2: The simulation process of conversation skeletons.
  • Figure 3: The question composition examples for the two types of conversations. For the hybrid conversation example, the annotator skips three turns and directly jumps to the last turn using references, making the conversation more natural.
  • Figure 4: Distribution of the longest dependency distances of the questions in ConvFinQA. Over 60% of the questions have longer dependencies with previous questions.
  • Figure 5: Performances for the nth conversation turn.
  • ...and 4 more figures