Table of Contents
Fetching ...

Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data

Younghun Lee, Sungchul Kim, Ryan A. Rossi, Tong Yu, Xiang Chen

TL;DR

Learning to Reduce introduces an on-policy reinforcement learning framework that trains a language-model policy to generate reduced, task-relevant inputs for structured data QA. By separating column and row reduction and optimizing a reward that prioritizes including relevant evidence, the method achieves high context-reduction recall and generalizes to unseen data, while improving downstream table QA performance when used with large LLMs. The approach is model-agnostic and can serve as a pre-prompting primitive to enable better reasoning on long structured data, offering efficiency gains and robustness. Limitations include evaluation on a narrow set of table QA tasks and the need for more task-specific rewards; future work could broaden datasets and extend to other structured data domains.

Abstract

Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Reduce, that fine-tunes a language model with On-Policy Learning to generate a reduced version of an input structured data. When compared to state-of-the-art LLMs like GPT-4, Learning to Reduce not only achieves outstanding performance in reducing the input, but shows generalizability on different datasets. We further show that the model fine-tuned with our framework helps LLMs better perform on table QA tasks especially when the context is longer.

Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data

TL;DR

Learning to Reduce introduces an on-policy reinforcement learning framework that trains a language-model policy to generate reduced, task-relevant inputs for structured data QA. By separating column and row reduction and optimizing a reward that prioritizes including relevant evidence, the method achieves high context-reduction recall and generalizes to unseen data, while improving downstream table QA performance when used with large LLMs. The approach is model-agnostic and can serve as a pre-prompting primitive to enable better reasoning on long structured data, offering efficiency gains and robustness. Limitations include evaluation on a narrow set of table QA tasks and the need for more task-specific rewards; future work could broaden datasets and extend to other structured data domains.

Abstract

Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Reduce, that fine-tunes a language model with On-Policy Learning to generate a reduced version of an input structured data. When compared to state-of-the-art LLMs like GPT-4, Learning to Reduce not only achieves outstanding performance in reducing the input, but shows generalizability on different datasets. We further show that the model fine-tuned with our framework helps LLMs better perform on table QA tasks especially when the context is longer.
Paper Structure (15 sections, 7 equations, 3 figures, 3 tables)

This paper contains 15 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Inference with an original table (dotted arrow) and with a reduced table (solid arrow). Given an input question and a table, a language model (blue hexagon) learns a policy to generate the relevant rows and columns by getting rewards. By learning the optimal policy, our model generates reduced tables which leads the fixed LLM model to perform more accurately on QA tasks.
  • Figure 2: Accuracy (precision) of GPT-4 model on WTQ test set with different input context tables. Reducing both rows and columns (red and purple) is more powerful when the context is longer.
  • Figure 3: Accuracy (precision) of GPT-3.5-turbo on WTQ test set with different input context tables.