Table of Contents
Fetching ...

STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack

Shashank Kirtania, Naman Gupta, Priyanshu Gupta, Krishna Kariya, Sumit Gulwani, Arun Iyer, Suresh Parthasarathy, Arjun Radhakrishna, Sriram K. Rajamani, Gustavo Soares

TL;DR

STACKFEED presents a feedback-driven framework for editing knowledge bases within Retrieval-Augmented Generation. It uses a multi-actor ReACT-based architecture with a centralized critic to produce structured, document-level edits guided by expert feedback, navigated via Monte Carlo Tree Search. Formulated as a state-search problem, STACKFEED demonstrates improved KB coherence, completeness, and downstream QA accuracy across low-resource programming and real-world migration benchmarks, without modifying model parameters. The approach offers a practical path toward live KB maintenance in RAG systems, balancing interpretability and performance in high-stakes contexts.

Abstract

Large Language Models (LLMs) often generate incorrect or outdated information, especially in low-resource settings or when dealing with private data. To address this, Retrieval-Augmented Generation (RAG) uses external knowledge bases (KBs), but these can also suffer from inaccuracies. We introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with FEEDback approach that iteratively refines the KB based on expert feedback using a multi-actor, centralized critic reinforcement learning framework. STACKFEED defines a ReACT actor agent on each document to perform structured edits based on document specific targeted instructions. Experimental results showcase that STACKFEED significantly improves KB quality and performance of the RAG system. We evaluate STACKFEED on low-resource programming problems, modified python packaged and factual question-answering tasks.

STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack

TL;DR

STACKFEED presents a feedback-driven framework for editing knowledge bases within Retrieval-Augmented Generation. It uses a multi-actor ReACT-based architecture with a centralized critic to produce structured, document-level edits guided by expert feedback, navigated via Monte Carlo Tree Search. Formulated as a state-search problem, STACKFEED demonstrates improved KB coherence, completeness, and downstream QA accuracy across low-resource programming and real-world migration benchmarks, without modifying model parameters. The approach offers a practical path toward live KB maintenance in RAG systems, balancing interpretability and performance in high-stakes contexts.

Abstract

Large Language Models (LLMs) often generate incorrect or outdated information, especially in low-resource settings or when dealing with private data. To address this, Retrieval-Augmented Generation (RAG) uses external knowledge bases (KBs), but these can also suffer from inaccuracies. We introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with FEEDback approach that iteratively refines the KB based on expert feedback using a multi-actor, centralized critic reinforcement learning framework. STACKFEED defines a ReACT actor agent on each document to perform structured edits based on document specific targeted instructions. Experimental results showcase that STACKFEED significantly improves KB quality and performance of the RAG system. We evaluate STACKFEED on low-resource programming problems, modified python packaged and factual question-answering tasks.

Paper Structure

This paper contains 26 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Example of the $\textsc{STACKFEED}$ pipeline in the ARKS Pony scenario. We explain the example in more detail in appendix \ref{['sec:exampleoverview']}
  • Figure 2: a) MCTS (Monte Carlo Tree Search) planning for state search. The tree structure enables strategic planning for $\textsc{STACKFEED}$. b) A simplified state transition example. Upon receiving a reward from the environment (or expert) on the given state of the knowledge base (KB) $s_0$, a centralized critic ① generates a reflection on observed failures to calculate the textual gradient. The critic uses this reflection to select documents responsible for the error and ② assigns credit to actors in the form of document-wise reflections. The actors then iteratively edit the documents to reach state $s_4$.
  • Figure 3: The above example showcases the edits made by $\textsc{STACKFEED}$.① represents a more precise and structured state of information made from edits by $\textsc{STACKFEED}$. ② showcases a fix that was written more coherently and with added details for the agent by observations made from trajectory.③ showcase added information from train set on resolution.