Table of Contents
Fetching ...

PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning

Hieu Tran, Zonghai Yao, Nguyen Luong Tran, Zhichao Yang, Feiyun Ouyang, Shuo Han, Razieh Rahimi, Hong Yu

TL;DR

PRIME tackles efficient, knowledge-intensive reasoning in large language models by mirroring human dual-process cognition. It combines a fast System 1 Quick Thinking Agent with a selective System 2 deliberation triggered by a Reflection Agent, which coordinates Planning, Retrieval, and Hypothesis Testing to ground answers in external evidence. Across medical and multi-hop benchmarks, PRIME improves accuracy and reduces hallucinations, with open-source LLaMA models reaching or surpassing GPT-4o on several tasks. An ablation and difficulty-aware analysis confirms the value of selective System 2 engagement and modular agent collaboration, though gating reliability and latency remain important considerations for deployment.

Abstract

Inspired by the dual-process theory of human cognition from \textit{Thinking, Fast and Slow}, we introduce \textbf{PRIME} (Planning and Retrieval-Integrated Memory for Enhanced Reasoning), a multi-agent reasoning framework that dynamically integrates \textbf{System 1} (fast, intuitive thinking) and \textbf{System 2} (slow, deliberate thinking). PRIME first employs a Quick Thinking Agent (System 1) to generate a rapid answer; if uncertainty is detected, it then triggers a structured System 2 reasoning pipeline composed of specialized agents for \textit{planning}, \textit{hypothesis generation}, \textit{retrieval}, \textit{information integration}, and \textit{decision-making}. This multi-agent design faithfully mimics human cognitive processes and enhances both efficiency and accuracy. Experimental results with LLaMA 3 models demonstrate that PRIME enables open-source LLMs to perform competitively with state-of-the-art closed-source models like GPT-4 and GPT-4o on benchmarks requiring multi-hop and knowledge-grounded reasoning. This research establishes PRIME as a scalable solution for improving LLMs in domains requiring complex, knowledge-intensive reasoning.

PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning

TL;DR

PRIME tackles efficient, knowledge-intensive reasoning in large language models by mirroring human dual-process cognition. It combines a fast System 1 Quick Thinking Agent with a selective System 2 deliberation triggered by a Reflection Agent, which coordinates Planning, Retrieval, and Hypothesis Testing to ground answers in external evidence. Across medical and multi-hop benchmarks, PRIME improves accuracy and reduces hallucinations, with open-source LLaMA models reaching or surpassing GPT-4o on several tasks. An ablation and difficulty-aware analysis confirms the value of selective System 2 engagement and modular agent collaboration, though gating reliability and latency remain important considerations for deployment.

Abstract

Inspired by the dual-process theory of human cognition from \textit{Thinking, Fast and Slow}, we introduce \textbf{PRIME} (Planning and Retrieval-Integrated Memory for Enhanced Reasoning), a multi-agent reasoning framework that dynamically integrates \textbf{System 1} (fast, intuitive thinking) and \textbf{System 2} (slow, deliberate thinking). PRIME first employs a Quick Thinking Agent (System 1) to generate a rapid answer; if uncertainty is detected, it then triggers a structured System 2 reasoning pipeline composed of specialized agents for \textit{planning}, \textit{hypothesis generation}, \textit{retrieval}, \textit{information integration}, and \textit{decision-making}. This multi-agent design faithfully mimics human cognitive processes and enhances both efficiency and accuracy. Experimental results with LLaMA 3 models demonstrate that PRIME enables open-source LLMs to perform competitively with state-of-the-art closed-source models like GPT-4 and GPT-4o on benchmarks requiring multi-hop and knowledge-grounded reasoning. This research establishes PRIME as a scalable solution for improving LLMs in domains requiring complex, knowledge-intensive reasoning.

Paper Structure

This paper contains 18 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of our reasoning process. The framework mimics human dual-system cognition by integrating fast, intuitive reasoning (System 1) and slower, deliberative reasoning (System 2).
  • Figure 2: Quick Thinking Agent for System 1 reasoning. Upon receiving a question, the Quick Thinking Agent rapidly decomposes it into a series of subquestions and answers each one sequentially.
  • Figure 3: Memory recall process in System 2 reasoning. When System 2 is triggered, the Planning Agent decomposes the question into targeted subquestions. For each subquestion requiring external knowledge, the Search Agent issues domain-specific queries and retrieves relevant documents. The Reading Agent then distills key information from the retrieved evidence. This mimics human memory recall, where reasoning is guided by selectively retrieving and integrating key facts rather than exact memorization of full documents.
  • Figure 4: Hypothesis generation and testing in System 2 reasoning. The Hypothesis Agent formulates multiple initial hypotheses based on the question. The Integration Agent then evaluates each hypothesis by aligning it with key evidence from the memory recall phase. This process mimics human scientific reasoning by generating, testing, and selecting the most plausible explanation.
  • Figure 5: Accuracy vs. Number of Generated Tokens for Different Methods