Table of Contents
Fetching ...

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA

Rishabh Maheshwary, Masoud Hashemi, Khyati Mahajan, Shiva Krishna Reddy Malay, Sai Rajeswar, Sathwik Tejaswi Madhusudhan, Spandana Gella, Vikas Yadav

TL;DR

The paper tackles the bottleneck of context overload and noise in iterative retrieval-augmented generation for complex multi-hop QA. It introduces NotesWriting, a plug-and-play module that uses a dedicated notes model to extract concise, query-relevant notes from retrieved documents at each step, effectively expanding usable context without overwhelming the main LLM. By integrating NotesWriting with ReAct (as ReNAct) and evaluating across three baselines, four datasets, and two LLMs, the authors demonstrate substantial performance gains (average ~15.6 F1 points) and reduced reasoning steps, while maintaining favorable cost efficiency. The approach enhances planning and reasoning in iterative RAG and offers a scalable path to more robust, explainable multi-hop QA systems, with the potential for broader adoption beyond the tested configurations.

Abstract

Iterative RAG for multi-hop question answering faces challenges with lengthy contexts and the buildup of irrelevant information. This hinders a model's capacity to process and reason over retrieved content and limits performance. While recent methods focus on compressing retrieved information, they are either restricted to single-round RAG, require finetuning or lack scalability in iterative RAG. To address these challenges, we propose Notes Writing, a method that generates concise and relevant notes from retrieved documents at each step, thereby reducing noise and retaining only essential information. This indirectly increases the effective context length of Large Language Models (LLMs), enabling them to reason and plan more effectively while processing larger volumes of input text. Notes Writing is framework agnostic and can be integrated with different iterative RAG methods. We demonstrate its effectiveness with three iterative RAG methods, across two models and four evaluation datasets. Notes writing yields an average improvement of 15.6 percentage points overall, with minimal increase in output tokens.

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA

TL;DR

The paper tackles the bottleneck of context overload and noise in iterative retrieval-augmented generation for complex multi-hop QA. It introduces NotesWriting, a plug-and-play module that uses a dedicated notes model to extract concise, query-relevant notes from retrieved documents at each step, effectively expanding usable context without overwhelming the main LLM. By integrating NotesWriting with ReAct (as ReNAct) and evaluating across three baselines, four datasets, and two LLMs, the authors demonstrate substantial performance gains (average ~15.6 F1 points) and reduced reasoning steps, while maintaining favorable cost efficiency. The approach enhances planning and reasoning in iterative RAG and offers a scalable path to more robust, explainable multi-hop QA systems, with the potential for broader adoption beyond the tested configurations.

Abstract

Iterative RAG for multi-hop question answering faces challenges with lengthy contexts and the buildup of irrelevant information. This hinders a model's capacity to process and reason over retrieved content and limits performance. While recent methods focus on compressing retrieved information, they are either restricted to single-round RAG, require finetuning or lack scalability in iterative RAG. To address these challenges, we propose Notes Writing, a method that generates concise and relevant notes from retrieved documents at each step, thereby reducing noise and retaining only essential information. This indirectly increases the effective context length of Large Language Models (LLMs), enabling them to reason and plan more effectively while processing larger volumes of input text. Notes Writing is framework agnostic and can be integrated with different iterative RAG methods. We demonstrate its effectiveness with three iterative RAG methods, across two models and four evaluation datasets. Notes writing yields an average improvement of 15.6 percentage points overall, with minimal increase in output tokens.

Paper Structure

This paper contains 33 sections, 5 equations, 7 figures, 20 tables.

Figures (7)

  • Figure 1: Overview of NotesWriting within an iterative RAG framework.
  • Figure 2: Quality evaluation of ReAct and ReNAct reasoning chain.
  • Figure 3: Steps (smoothed) by ReNAct, ReAct vs the ground truth steps for GPT-4o-mini and LLama-3.1-70B.
  • Figure 4: Fewshot prompt used for the evaluation of IRCoT and FLARE methods.
  • Figure 5: Notes writing prompt for extracting the relevant information.
  • ...and 2 more figures