Optimizing Retrieval-Augmented Generation of Medical Content for Spaced Repetition Learning

Jeremi I. Kaczmarek; Jakub Pokrywka; Krzysztof Biedalak; Grzegorz Kurzyp; Łukasz Grzybowski

Optimizing Retrieval-Augmented Generation of Medical Content for Spaced Repetition Learning

Jeremi I. Kaczmarek, Jakub Pokrywka, Krzysztof Biedalak, Grzegorz Kurzyp, Łukasz Grzybowski

TL;DR

The study tackles the need for scalable, high-quality medical education content for Polish specialists by deploying a Retrieval-Augmented Generation (RAG) pipeline integrated with a spaced repetition framework. It introduces a modular system including a Query Rephraser, a SOLR-based Polish medical retrieval engine with a cross-encoder reranker, and GPT-4o-generated, source-backed commentary, all linked to Medico PZWL and SuperMemo. Thorough human-centric evaluation demonstrates improvements in document relevance, credibility, and logical coherence, with total relevant documents rising from 4.59 to 6.83 out of 10 and a robust validation protocol across multiple specialties. The work highlights the potential of RAG to deliver scalable, accurate, and individualized medical learning resources, particularly for non-English audiences, while emphasizing rigorous verification and traceability to authoritative sources.

Abstract

Advances in Large Language Models revolutionized medical education by enabling scalable and efficient learning solutions. This paper presents a pipeline employing Retrieval-Augmented Generation (RAG) system to prepare comments generation for Poland's State Specialization Examination (PES) based on verified resources. The system integrates these generated comments and source documents with a spaced repetition learning algorithm to enhance knowledge retention while minimizing cognitive overload. By employing a refined retrieval system, query rephraser, and an advanced reranker, our modified RAG solution promotes accuracy more than efficiency. Rigorous evaluation by medical annotators demonstrates improvements in key metrics such as document relevance, credibility, and logical coherence of generated content, proven by a series of experiments presented in the paper. This study highlights the potential of RAG systems to provide scalable, high-quality, and individualized educational resources, addressing non-English speaking users.

Optimizing Retrieval-Augmented Generation of Medical Content for Spaced Repetition Learning

TL;DR

Abstract

Optimizing Retrieval-Augmented Generation of Medical Content for Spaced Repetition Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)