Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Zhongzhen Huang; Kui Xue; Yongqi Fan; Linjie Mu; Ruoyu Liu; Tong Ruan; Shaoting Zhang; Xiaofan Zhang

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Zhongzhen Huang, Kui Xue, Yongqi Fan, Linjie Mu, Ruoyu Liu, Tong Ruan, Shaoting Zhang, Xiaofan Zhang

TL;DR

Addressing hallucinations and temporal misalignment in medical LLMs, the paper introduces MedicineQA, a multi-round medication-consultation benchmark, and RagPULSE, a Distill-Retrieve-Read retrieval-augmented pipeline that uses tool calling to distill dialogue history into search queries and retrieve evidence from an entity-oriented medicine database. Experiments show RagPULSE consistently outperforms open-source baselines and rivals commercial systems on evidence retrieval and grounded response generation, with ablations confirming the value of history distillation and tool-assisted querying. The approach demonstrates the practical potential of retrieval-augmented medical LLMs for safer, more reliable medication consultations and offers a scalable framework for domain-specific knowledge integration.

Abstract

Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the lack of domain-specific knowledge and the intricacy of real-world scenarios. In this study, we explore LLMs with RAG framework for knowledge-intensive tasks in the medical field. To evaluate the capabilities of LLMs, we introduce MedicineQA, a multi-round dialogue benchmark that simulates the real-world medication consultation scenario and requires LLMs to answer with retrieved evidence from the medicine database. MedicineQA contains 300 multi-round question-answering pairs, each embedded within a detailed dialogue history, highlighting the challenge posed by this knowledge-intensive task to current LLMs. We further propose a new \textit{Distill-Retrieve-Read} framework instead of the previous \textit{Retrieve-then-Read}. Specifically, the distillation and retrieval process utilizes a tool calling mechanism to formulate search queries that emulate the keyword-based inquiries used by search engines. With experimental results, we show that our framework brings notable performance improvements and surpasses the previous counterparts in the evidence retrieval process in terms of evidence retrieval accuracy. This advancement sheds light on applying RAG to the medical domain.

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

TL;DR

Abstract

Paper Structure (13 sections, 2 equations, 4 figures, 3 tables)

This paper contains 13 sections, 2 equations, 4 figures, 3 tables.

Introduction
Related Work
Method
Benchmark Creation
RagPULSE
Experiments
Experimental Settings
Results
Ablation Studies
Case Study
Conclusion
Appendix
Details of Elo

Figures (4)

Figure 1: The medication consultation: a detailed discussion between healthcare professionals and users about prescribed medications, including their names, indications, usage, side effects, etc. Professionals utilize the knowledge in the medicine database to provide a more robust response.
Figure 2: (a) The distribution of our proposed MedicineQA. MedicineQA involves ten specific scenarios of the medication consultation. The distribution of the benchmark is similar to that of the real scenario. (b) Samples of the benchmark: Interaction, Adverse reactions, and Contraindications. Our benchmark is available in both English and Chinese.
Figure 3: The overall workflow of our RagPULSE in the medication consultation scenario, consists of three steps: (1) Distilling the key information and forming the searching query from the dialogue history; (2) Retrieving the corresponding medicine evidence from the medicine database; (3) Generating the response according to the retrieved evidence.
Figure 4: Case studies of LLMs' retrieval process and generated responses. LLMs first summarize the dialogue history and then generate search queries. The responses are formulated via the retrieved document. Key information is marked by red text.

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

TL;DR

Abstract

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)