Table of Contents
Fetching ...

FReM: A Flexible Reasoning Mechanism for Balancing Quick and Slow Thinking in Long-Context Question Answering

Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Bin Liang, Binyang Li, Kam-Fai Wong

TL;DR

This paper tackles long-context QA by addressing the trade-off between quick pattern-matching and slow exhaustive reasoning. It introduces FReM, which synthesizes multiple reference demos with explicit reasoning paths and uses a multi-criteria selection mechanism to pick the most suitable chain of thought for each question. By focusing the model’s reasoning on the most relevant path, FReM improves reasoning accuracy and scalability, particularly for multi-hop and domain-specific questions, while avoiding unnecessary overhead on simple queries. Across seven diverse QA datasets, FReM consistently outperforms both quick-thinking and slow-thinking baselines, highlighting its potential to advance efficient and adaptive LCQA methods.

Abstract

Long-context question-answering (LCQA) systems have greatly benefited from the powerful reasoning capabilities of large language models (LLMs), which can be categorized into slow and quick reasoning modes. However, both modes have their limitations. Slow thinking generally leans to explore every possible reasoning path, which leads to heavy overthinking and wastes time. Quick thinking usually relies on pattern matching rather than truly understanding the query logic, which misses proper understanding. To address these issues, we propose FReM: Flexible Reasoning Mechanism, a method that adjusts reasoning depth according to the complexity of each question. Specifically, FReM leverages synthetic reference QA examples to provide an explicit chain of thought, enabling efficient handling of simple queries while allowing deeper reasoning for more complex ones. By doing so, FReM helps quick-thinking models move beyond superficial pattern matching and narrows the reasoning space for slow-thinking models to avoid unnecessary exploration. Experiments on seven QA datasets show that FReM improves reasoning accuracy and scalability, particularly for complex multihop questions, indicating its potential to advance LCQA methodologies.

FReM: A Flexible Reasoning Mechanism for Balancing Quick and Slow Thinking in Long-Context Question Answering

TL;DR

This paper tackles long-context QA by addressing the trade-off between quick pattern-matching and slow exhaustive reasoning. It introduces FReM, which synthesizes multiple reference demos with explicit reasoning paths and uses a multi-criteria selection mechanism to pick the most suitable chain of thought for each question. By focusing the model’s reasoning on the most relevant path, FReM improves reasoning accuracy and scalability, particularly for multi-hop and domain-specific questions, while avoiding unnecessary overhead on simple queries. Across seven diverse QA datasets, FReM consistently outperforms both quick-thinking and slow-thinking baselines, highlighting its potential to advance efficient and adaptive LCQA methods.

Abstract

Long-context question-answering (LCQA) systems have greatly benefited from the powerful reasoning capabilities of large language models (LLMs), which can be categorized into slow and quick reasoning modes. However, both modes have their limitations. Slow thinking generally leans to explore every possible reasoning path, which leads to heavy overthinking and wastes time. Quick thinking usually relies on pattern matching rather than truly understanding the query logic, which misses proper understanding. To address these issues, we propose FReM: Flexible Reasoning Mechanism, a method that adjusts reasoning depth according to the complexity of each question. Specifically, FReM leverages synthetic reference QA examples to provide an explicit chain of thought, enabling efficient handling of simple queries while allowing deeper reasoning for more complex ones. By doing so, FReM helps quick-thinking models move beyond superficial pattern matching and narrows the reasoning space for slow-thinking models to avoid unnecessary exploration. Experiments on seven QA datasets show that FReM improves reasoning accuracy and scalability, particularly for complex multihop questions, indicating its potential to advance LCQA methodologies.

Paper Structure

This paper contains 45 sections, 9 equations, 9 figures, 13 tables, 1 algorithm.

Figures (9)

  • Figure 1: Demonstration of multihop reasoning in different domains. (a) shows two LLMs under multihop QA setting, (b) shows single-step QA settings. (c) shows LLMs with our proposed FReM framework.
  • Figure 2: Overview of Reasoning-Understanding-Narrowing Mechanism. We leverage LLMs synthesis ability to explore the best reasoning path for question answering.
  • Figure 3: Extended experimental results on three open-source models for HotpotQA.
  • Figure 4: Retrace analysis on NewsQA and HQA.
  • Figure 5: Impact of the number of synthetic demos on EM scores for HotpotQA (left) and NewsQA (right).
  • ...and 4 more figures