Table of Contents
Fetching ...

YpathRAG:A Retrieval-Augmented Generation Framework and Benchmark for Pathology

Deshui Yu, Yizhi Wang, Saihui Jin, Taojie Zhu, Fanyi Zeng, Wen Qian, Zirui Huang, Jingli Ouyang, Jiameng Li, Zhen Song, Tian Guan, Yonghong He

TL;DR

YpathRAG is presented, a pathology-oriented RAG framework with dual-channel hybrid retrieval (BGE-M3 dense retrieval coupled with vocabulary-guided sparse retrieval) and an LLM-based supportive-evidence judgment module that closes the retrieval-judgment-generation loop.

Abstract

Large language models (LLMs) excel on general tasks yet still hallucinate in high-barrier domains such as pathology. Prior work often relies on domain fine-tuning, which neither expands the knowledge boundary nor enforces evidence-grounded constraints. We therefore build a pathology vector database covering 28 subfields and 1.53 million paragraphs, and present YpathRAG, a pathology-oriented RAG framework with dual-channel hybrid retrieval (BGE-M3 dense retrieval coupled with vocabulary-guided sparse retrieval) and an LLM-based supportive-evidence judgment module that closes the retrieval-judgment-generation loop. We also release two evaluation benchmarks, YpathR and YpathQA-M. On YpathR, YpathRAG attains Recall@5 of 98.64%, a gain of 23 percentage points over the baseline; on YpathQA-M, a set of the 300 most challenging questions, it increases the accuracies of both general and medical LLMs by 9.0% on average and up to 15.6%. These results demonstrate improved retrieval quality and factual reliability, providing a scalable construction paradigm and interpretable evaluation for pathology-oriented RAG.

YpathRAG:A Retrieval-Augmented Generation Framework and Benchmark for Pathology

TL;DR

YpathRAG is presented, a pathology-oriented RAG framework with dual-channel hybrid retrieval (BGE-M3 dense retrieval coupled with vocabulary-guided sparse retrieval) and an LLM-based supportive-evidence judgment module that closes the retrieval-judgment-generation loop.

Abstract

Large language models (LLMs) excel on general tasks yet still hallucinate in high-barrier domains such as pathology. Prior work often relies on domain fine-tuning, which neither expands the knowledge boundary nor enforces evidence-grounded constraints. We therefore build a pathology vector database covering 28 subfields and 1.53 million paragraphs, and present YpathRAG, a pathology-oriented RAG framework with dual-channel hybrid retrieval (BGE-M3 dense retrieval coupled with vocabulary-guided sparse retrieval) and an LLM-based supportive-evidence judgment module that closes the retrieval-judgment-generation loop. We also release two evaluation benchmarks, YpathR and YpathQA-M. On YpathR, YpathRAG attains Recall@5 of 98.64%, a gain of 23 percentage points over the baseline; on YpathQA-M, a set of the 300 most challenging questions, it increases the accuracies of both general and medical LLMs by 9.0% on average and up to 15.6%. These results demonstrate improved retrieval quality and factual reliability, providing a scalable construction paradigm and interpretable evaluation for pathology-oriented RAG.

Paper Structure

This paper contains 6 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overall architecture of YpathRAG, a pathology-oriented yet diagnosis- and multimodal-capable RAG framework integrating dense retrieval, lexicon-guided sparse retrieval, and an LLM-based relevance filter for factual grounding and semantic consistency across knowledge QA, diagnostic reasoning, and multimodal QA.
  • Figure 2: Overall workflow of the YpathRAG framework. The framework integrates hybrid retrieval (sparse and dense channels), supportive evidence discrimination, and a two-stage LLM generation strategy, thereby systematically enhancing retrieval and question-answering performance in pathology tasks.
  • Figure 3: Data processing workflow of the pathology vector corpus. The system extracts text from multi-source documents through OCR, performs semantic chunking and domain-specific refinement, and finally builds a high-quality pathology corpus containing 1.53 million semantic segments for retrieval and question answering.
  • Figure 4: Construction pipeline of the YpathRAG pathology lexicon. Curated pathology texts undergo medical lexicon tokenization and new-word detection, followed by pathology filtering and LLM validation, yielding a clean lexicon that separates pathology-specific and generic medical terms. The resulting vocabulary powers the sparse retrieval channel in our hybrid RAG.
  • Figure 5: Construction workflow of the pathology evaluation benchmarks. The process consists of three stages: (1) automatic generation of expert-level questions and reference answers from authoritative literature; (2) multi-level construction of positive and negative samples (P1–P3, A1–A4) based on semantic support strength; and (3) GPT-4o-assisted support scoring and difficulty assessment to produce interpretable, high-quality benchmarks for pathology retrieval and QA.
  • ...and 3 more figures