Table of Contents
Fetching ...

DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router

Minghao Guo, Qingcheng Zeng, Xujiang Zhao, Yanchi Liu, Wenchao Yu, Mengnan Du, Haifeng Chen, Wei Cheng

TL;DR

DeepSieve is introduced, an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router that decomposes complex queries into structured sub-questions and recursively routes each to the most suitable knowledge source through a multi-stage distillation process.

Abstract

Large Language Models (LLMs) excel at many reasoning tasks but struggle with knowledge-intensive queries due to their inability to dynamically access up-to-date or domain-specific information. Retrieval-Augmented Generation (RAG) has emerged as a promising solution, enabling LLMs to ground their responses in external sources. However, existing RAG methods lack fine-grained control over both the query and source sides, often resulting in noisy retrieval and shallow reasoning. In this work, we introduce DeepSieve, an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router. DeepSieve decomposes complex queries into structured sub-questions and recursively routes each to the most suitable knowledge source, filtering irrelevant information through a multi-stage distillation process. Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design. Experiments on multi-hop QA tasks across heterogeneous sources demonstrate improved reasoning depth, retrieval precision, and interpretability over conventional RAG approaches. Our codes are available at https://github.com/MinghoKwok/DeepSieve.

DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router

TL;DR

DeepSieve is introduced, an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router that decomposes complex queries into structured sub-questions and recursively routes each to the most suitable knowledge source through a multi-stage distillation process.

Abstract

Large Language Models (LLMs) excel at many reasoning tasks but struggle with knowledge-intensive queries due to their inability to dynamically access up-to-date or domain-specific information. Retrieval-Augmented Generation (RAG) has emerged as a promising solution, enabling LLMs to ground their responses in external sources. However, existing RAG methods lack fine-grained control over both the query and source sides, often resulting in noisy retrieval and shallow reasoning. In this work, we introduce DeepSieve, an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router. DeepSieve decomposes complex queries into structured sub-questions and recursively routes each to the most suitable knowledge source, filtering irrelevant information through a multi-stage distillation process. Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design. Experiments on multi-hop QA tasks across heterogeneous sources demonstrate improved reasoning depth, retrieval precision, and interpretability over conventional RAG approaches. Our codes are available at https://github.com/MinghoKwok/DeepSieve.

Paper Structure

This paper contains 55 sections, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Motivation and Overview: Left: Compositional queries are hard to answer under source heterogeneity (e.g., structured, private, and unmergeable databases). Right: DeepSieve performs decomposition, source-aware routing, and iterative fusion to enable structured reasoning.
  • Figure 2: DeepSieve workflow with multi-step reasoning: A complex query is first decomposed into diverse subqueries, like a directed acyclic graph (DAG). For each path in the DAG, an LLM generates a plan to select knowledge sources via routing. Failed retrievals will trigger re-routing or re-decomposition in the workflow. Retrieved subanswers are stored in memory and later fused across paths to form a final answer.
  • Figure 3: Normalized radar plot comparing agentic methods based on their average scores across all benchmarks. The plot evaluates methods across three dimensions: F1 score, EM score, and token efficiency represented as #Tokens (inverse). All metrics are normalized, and a higher value indicates better performance for each axis. A larger enclosed area signifies a superior trade-off between accuracy and computational cost.
  • Figure 4: Ablation study of F1 score improvements (blue) and declines (red) over Naive RAG across datasets. Color intensity corresponds to the magnitude of performance change, with darker shades indicating stronger effects.
  • Figure 5: A comparison of token costs for each stage of the framework. The stacked bars illustrate the cumulative token usage per subquery, showing the base cost of Decomposition (blue), plus the additional costs from Routing (red) and Reflexion (green).
  • ...and 1 more figures