Table of Contents
Fetching ...

MDKeyChunker: Single-Call LLM Enrichment with Rolling Keys and Key-Based Restructuring for High-Accuracy RAG

Bhavik Mangla

Abstract

RAG pipelines typically rely on fixed-size chunking, which ignores document structure, fragments semantic units across boundaries, and requires multiple LLM calls per chunk for metadata extraction. We present MDKeyChunker, a three-stage pipeline for Markdown documents that (1) performs structure-aware chunking treating headers, code blocks, tables, and lists as atomic units; (2) enriches each chunk via a single LLM call extracting title, summary, keywords, typed entities, hypothetical questions, and a semantic key, while propagating a rolling key dictionary to maintain document-level context; and (3) restructures chunks by merging those sharing the same semantic key via bin-packing, co-locating related content for retrieval. The single-call design extracts all seven metadata fields in one LLM invocation, eliminating the need for separate per-field extraction passes. Rolling key propagation replaces hand-tuned scoring with LLM-native semantic matching. An empirical evaluation on 30 queries over an 18-document Markdown corpus shows Config D (BM25 over structural chunks) achieves Recall@5=1.000 and MRR=0.911, while dense retrieval over the full pipeline (Config C) reaches Recall@5=0.867. MDKeyChunker is implemented in Python with four dependencies and supports any OpenAI-compatible endpoint.

MDKeyChunker: Single-Call LLM Enrichment with Rolling Keys and Key-Based Restructuring for High-Accuracy RAG

Abstract

RAG pipelines typically rely on fixed-size chunking, which ignores document structure, fragments semantic units across boundaries, and requires multiple LLM calls per chunk for metadata extraction. We present MDKeyChunker, a three-stage pipeline for Markdown documents that (1) performs structure-aware chunking treating headers, code blocks, tables, and lists as atomic units; (2) enriches each chunk via a single LLM call extracting title, summary, keywords, typed entities, hypothetical questions, and a semantic key, while propagating a rolling key dictionary to maintain document-level context; and (3) restructures chunks by merging those sharing the same semantic key via bin-packing, co-locating related content for retrieval. The single-call design extracts all seven metadata fields in one LLM invocation, eliminating the need for separate per-field extraction passes. Rolling key propagation replaces hand-tuned scoring with LLM-native semantic matching. An empirical evaluation on 30 queries over an 18-document Markdown corpus shows Config D (BM25 over structural chunks) achieves Recall@5=1.000 and MRR=0.911, while dense retrieval over the full pipeline (Config C) reaches Recall@5=0.867. MDKeyChunker is implemented in Python with four dependencies and supports any OpenAI-compatible endpoint.
Paper Structure (42 sections, 2 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 42 sections, 2 equations, 4 figures, 7 tables, 2 algorithms.

Figures (4)

  • Figure 1: MDKeyChunker three-stage pipeline: structural Markdown splitting (Stage 1), single-call LLM enrichment with rolling key propagation (Stage 2), and key-based bin-packing restructuring (Stage 3).
  • Figure 2: Rolling key propagation: each chunk enrichment call receives the accumulated key dictionary $K$, enabling the LLM to reuse prior keys (e.g. "admissions process" introduced at $c_2$ is reused at $c_5$) instead of coining synonyms.
  • Figure 3: Chunk-size distributions. Config A (fixed-size) concentrates 99% of chunks at exactly 512 chars. Configs B and C (structural splitting) produce variable-length chunks; Config D shares Config B's chunk set and is labeled on that panel.
  • Figure 4: Retrieval performance across four configurations. Config D (BM25 over structural chunks) achieves perfect Recall@5 and Recall@10; Config A (dense, fixed-size) leads among dense configs at all cut-offs.