Table of Contents
Fetching ...

MedNuggetizer: Confidence-Based Information Nugget Extraction from Medical Documents

Gregor Donabauer, Samy Ateia, Udo Kruschwitz, Maximilian Burger, Matthias May, Christian Gilfrich, Maximilian Haas, Julio Ruben Rodas Garzaro, Christoph Eckl

TL;DR

MedNuggetizer tackles reproducibility concerns in LLM-based evidence extraction by providing a query-driven nugget extraction and clustering pipeline that emphasizes repeated sampling and confidence-based filtering. It integrates Information Nugget Extraction, BERTopic-based clustering, and LLM-driven summary generation to produce high-confidence, query-focused medical evidence across multiple documents, validated in a urology use case on antibiotic prophylaxis before prostate biopsy. The approach demonstrates improved reliability of extracted recommendations and supports organized, cross-document evidence synthesis for clinicians and researchers. The work also offers open-source tooling and data to promote reproducible evaluation of LLM-driven medical information extraction.

Abstract

We present MedNuggetizer, https://mednugget-ai.de/; access is available upon request.}, a tool for query-driven extraction and clustering of information nuggets from medical documents to support clinicians in exploring underlying medical evidence. Backed by a large language model (LLM), \textit{MedNuggetizer} performs repeated extractions of information nuggets that are then grouped to generate reliable evidence within and across multiple documents. We demonstrate its utility on the clinical use case of \textit{antibiotic prophylaxis before prostate biopsy} by using major urological guidelines and recent PubMed studies as sources of information. Evaluation by domain experts shows that \textit{MedNuggetizer} provides clinicians and researchers with an efficient way to explore long documents and easily extract reliable, query-focused medical evidence.

MedNuggetizer: Confidence-Based Information Nugget Extraction from Medical Documents

TL;DR

MedNuggetizer tackles reproducibility concerns in LLM-based evidence extraction by providing a query-driven nugget extraction and clustering pipeline that emphasizes repeated sampling and confidence-based filtering. It integrates Information Nugget Extraction, BERTopic-based clustering, and LLM-driven summary generation to produce high-confidence, query-focused medical evidence across multiple documents, validated in a urology use case on antibiotic prophylaxis before prostate biopsy. The approach demonstrates improved reliability of extracted recommendations and supports organized, cross-document evidence synthesis for clinicians and researchers. The work also offers open-source tooling and data to promote reproducible evaluation of LLM-driven medical information extraction.

Abstract

We present MedNuggetizer, https://mednugget-ai.de/; access is available upon request.}, a tool for query-driven extraction and clustering of information nuggets from medical documents to support clinicians in exploring underlying medical evidence. Backed by a large language model (LLM), \textit{MedNuggetizer} performs repeated extractions of information nuggets that are then grouped to generate reliable evidence within and across multiple documents. We demonstrate its utility on the clinical use case of \textit{antibiotic prophylaxis before prostate biopsy} by using major urological guidelines and recent PubMed studies as sources of information. Evaluation by domain experts shows that \textit{MedNuggetizer} provides clinicians and researchers with an efficient way to explore long documents and easily extract reliable, query-focused medical evidence.

Paper Structure

This paper contains 9 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Screenshot of the MedNuggetizer web interface. The system allows users to upload PDF files, as well as specifying a query and clustering specific parameters.