Table of Contents
Fetching ...

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

Jiashuo Sun, Hang Zhang, Chen Lin, Xiangdong Su, Yeyun Gong, Jian Guo

TL;DR

APOLLO includes a number-aware negative sampling strategy for the retriever to discriminate key numerical facts, and a consistency-based reinforcement learning with target program augmentation for the generator to ultimately increase the execution accuracy.

Abstract

Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art.

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

TL;DR

APOLLO includes a number-aware negative sampling strategy for the retriever to discriminate key numerical facts, and a consistency-based reinforcement learning with target program augmentation for the generator to ultimately increase the execution accuracy.

Abstract

Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art.
Paper Structure (32 sections, 6 equations, 5 figures, 8 tables)

This paper contains 32 sections, 6 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: An example of Long-form Numerical Reasoning. The parameters in gold program are directly from the numerical fact (e.g., table column 2 and the textual fragment in green) instead of the non-numerical fact (e.g., the textual fragment in red). The answer can be equally generated from the gold program and the consistent program. Const_x, #i denotes constant $x$ and the result of the previous $i-1^{th}$ operator.
  • Figure 2: The novel hard negative sampling strategy in APOLLO. The facts for four methods are sampled from the same document in Figure \ref{['figure:example']}. We compared our sampling method with three frequently conventional methods: Random, BM25 and Self-mining.
  • Figure 3: The overall architecture of retriever-generator framework with APOLLO. $F_n$ and $F_p$ denotes negative facts and positive facts for training, respectively, and $F_1,F_2,F_3$ represent the retrieved facts. We use golden program in Figure \ref{['figure:example']} as an example. The left portion of the figure illustrates the retriever and encoding process for the generator, while the right portion illustrates the complete process of generating the "EOF" token, implementing target program augmentation, and consistency-based reinforcement learning. The generator utilizes cross-entropy to supervise the generation of predicted programs, using both the golden program and programs generated through target program augmentation as reference. Then, APOLLO samples consistent program and executes with golden program to obtain the execution and golden results, which are then used in Equation \ref{['equation:rl']} to calculate the consistent reward. This consistent reward is then employed to update all parameters.
  • Figure 4: The specific form of four target program augmentation construction. All the results of the programs generated by target program augmentation are the same as the golden program.
  • Figure 5: Performance comparisons on the private test set of the FinQA and ConvFinQA. We report APOLLO, best competitor, Top 5 average and Top 10 average scores on both leaderboard. At the time of submission (25 Nov. 2022), APOLLO has achieved state-of-the-art in both leaderboards.