KohakuRAG: A simple RAG framework with hierarchical document indexing

Shih-Ying Yeh; Yueh-Feng Ku; Ko-Wei Huang; Buu-Khang Tu

KohakuRAG: A simple RAG framework with hierarchical document indexing

Shih-Ying Yeh, Yueh-Feng Ku, Ko-Wei Huang, Buu-Khang Tu

TL;DR

KohakuRAG is presented, a hierarchical RAG framework that preserves document structure through a four-level tree representation with bottom-up embedding aggregation, improves retrieval coverage through an LLM-powered query planner with cross-query reranking, and stabilizes answers through ensemble inference with abstention-aware voting.

Abstract

Retrieval-augmented generation (RAG) systems that answer questions from document collections face compounding difficulties when high-precision citations are required: flat chunking strategies sacrifice document structure, single-query formulations miss relevant passages through vocabulary mismatch, and single-pass inference produces stochastic answers that vary in both content and citation selection. We present KohakuRAG, a hierarchical RAG framework that preserves document structure through a four-level tree representation (document $\rightarrow$ section $\rightarrow$ paragraph $\rightarrow$ sentence) with bottom-up embedding aggregation, improves retrieval coverage through an LLM-powered query planner with cross-query reranking, and stabilizes answers through ensemble inference with abstention-aware voting. We evaluate on the WattBot 2025 Challenge, a benchmark requiring systems to answer technical questions from 32 documents with $\pm$0.1% numeric tolerance and exact source attribution. KohakuRAG achieves first place on both public and private leaderboards (final score 0.861), as the only team to maintain the top position across both evaluation partitions. Ablation studies reveal that prompt ordering (+80% relative), retry mechanisms (+69%), and ensemble voting with blank filtering (+1.2pp) each contribute substantially, while hierarchical dense retrieval alone matches hybrid sparse-dense approaches (BM25 adds only +3.1pp). We release KohakuRAG as open-source software at https://github.com/KohakuBlueleaf/KohakuRAG.

KohakuRAG: A simple RAG framework with hierarchical document indexing

TL;DR

Abstract

section

paragraph

sentence) with bottom-up embedding aggregation, improves retrieval coverage through an LLM-powered query planner with cross-query reranking, and stabilizes answers through ensemble inference with abstention-aware voting. We evaluate on the WattBot 2025 Challenge, a benchmark requiring systems to answer technical questions from 32 documents with

0.1% numeric tolerance and exact source attribution. KohakuRAG achieves first place on both public and private leaderboards (final score 0.861), as the only team to maintain the top position across both evaluation partitions. Ablation studies reveal that prompt ordering (+80% relative), retry mechanisms (+69%), and ensemble voting with blank filtering (+1.2pp) each contribute substantially, while hierarchical dense retrieval alone matches hybrid sparse-dense approaches (BM25 adds only +3.1pp). We release KohakuRAG as open-source software at https://github.com/KohakuBlueleaf/KohakuRAG.

Paper Structure (120 sections, 6 equations, 27 figures, 16 tables, 5 algorithms)

This paper contains 120 sections, 6 equations, 27 figures, 16 tables, 5 algorithms.

Introduction
Preliminaries
Large Language Models
Text Embeddings and Retrieval
Dense Vector Search.
Sparse Vector Search.
Retrieval-Augmented Generation
Problem Formulation: The WattBot 2025 Challenge
Task Definition.
Evaluation Protocol.
Related Work
Document Chunking and Hierarchical Indexing.
Query Expansion and Multi-Query Retrieval.
Self-Reflection and Corrective RAG.
Ensemble Methods for RAG.
...and 105 more sections

Figures (27)

Figure 1: Overview of KohakuRAG. Left (Hierarchical Indexing): Documents are parsed into tree structures with sections, paragraphs (Para), and sentences (S). Sentence embeddings are computed and aggregated bottom-up to parent levels, then stored in a Vector DB. Center (Multi-Query Retrieval): Given a question, the Query Planner (LLM) generates multiple related queries, each retrieving Top-K results that are merged via Cross-Query Reranking. Right (Ensemble Inference): Context and question are sent to the LLM for $m$ independent runs; blank responses are filtered (X), and majority voting produces the final answer.
Figure 2: Hierarchical context expansion. Given a query, we retrieve Top-K nodes across multiple granularity levels (sentences in cyan, paragraphs in green, sections in purple). Section-level nodes are filtered out as too coarse (marked with X). The remaining nodes are expanded by adding their parent nodes to provide broader context. For example, a matched sentence sec1:p1:s1 brings its parent paragraph sec1:p1, while a matched paragraph sec2:p2 brings its parent section sec2.
Figure 3: Retry mechanism flow. When the LLM outputs is_blank=true (insufficient evidence), the system increases top-$k$ and re-retrieves context. This loop continues until a valid answer is produced or the retry limit is reached, at which point the system outputs a blank response.
Figure 4: Ensemble voting with blank handling. Case 1 (Mixed Responses): When non-blank answers exist among 5 runs {a, a, blank, b, a}, blank responses are filtered (X) before majority voting, yielding {a, a, b, a} with majority "a" (3 votes). Case 2 (All Blank): When all runs return blank, the system correctly abstains rather than hallucinating, preserving the hallucination score.
Figure 5: Dual-path image processing. Left: Caption-based retrieval via Qwen3-VL captioning and Jina v3/v4 text embedding. Right: Direct image embedding via Jina v4's multimodal encoder.
...and 22 more figures

KohakuRAG: A simple RAG framework with hierarchical document indexing

TL;DR

Abstract

KohakuRAG: A simple RAG framework with hierarchical document indexing

Authors

TL;DR

Abstract

Table of Contents

Figures (27)