Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models
Zhongzhen Huang, Kui Xue, Yongqi Fan, Linjie Mu, Ruoyu Liu, Tong Ruan, Shaoting Zhang, Xiaofan Zhang
TL;DR
Addressing hallucinations and temporal misalignment in medical LLMs, the paper introduces MedicineQA, a multi-round medication-consultation benchmark, and RagPULSE, a Distill-Retrieve-Read retrieval-augmented pipeline that uses tool calling to distill dialogue history into search queries and retrieve evidence from an entity-oriented medicine database. Experiments show RagPULSE consistently outperforms open-source baselines and rivals commercial systems on evidence retrieval and grounded response generation, with ablations confirming the value of history distillation and tool-assisted querying. The approach demonstrates the practical potential of retrieval-augmented medical LLMs for safer, more reliable medication consultations and offers a scalable framework for domain-specific knowledge integration.
Abstract
Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the lack of domain-specific knowledge and the intricacy of real-world scenarios. In this study, we explore LLMs with RAG framework for knowledge-intensive tasks in the medical field. To evaluate the capabilities of LLMs, we introduce MedicineQA, a multi-round dialogue benchmark that simulates the real-world medication consultation scenario and requires LLMs to answer with retrieved evidence from the medicine database. MedicineQA contains 300 multi-round question-answering pairs, each embedded within a detailed dialogue history, highlighting the challenge posed by this knowledge-intensive task to current LLMs. We further propose a new \textit{Distill-Retrieve-Read} framework instead of the previous \textit{Retrieve-then-Read}. Specifically, the distillation and retrieval process utilizes a tool calling mechanism to formulate search queries that emulate the keyword-based inquiries used by search engines. With experimental results, we show that our framework brings notable performance improvements and surpasses the previous counterparts in the evidence retrieval process in terms of evidence retrieval accuracy. This advancement sheds light on applying RAG to the medical domain.
