Table of Contents
Fetching ...

MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance

Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page

TL;DR

MALADE presents a pioneering multi-agent framework for ADE extraction from FDA drug labels using retrieval-augmented generation. By decomposing the task into three specialized agents (DrugFinder, DrugAgent, CategoryAgent) and pairing each with a Critic, the system grounds reasoning in external sources, delivers structured, justification-rich outputs, and achieves strong OMOP-based evaluation (AUC up to $\approx 0.90$). The agent–critic pattern, coupled with RAG and careful task decomposition, yields reliable, interpretable results and a generalizable blueprint for trustworthy medical AI in pharmacovigilance. The work demonstrates the feasibility and benefits of orchestrating LLM-powered agents for high-stakes clinical knowledge synthesis, with open-source tooling and clear pathways for extension to broader PhV tasks.

Abstract

In the era of Large Language Models (LLMs), given their remarkable text understanding and generation abilities, there is an unprecedented opportunity to develop new, LLM-based methods for trustworthy medical knowledge synthesis, extraction and summarization. This paper focuses on the problem of Pharmacovigilance (PhV), where the significance and challenges lie in identifying Adverse Drug Events (ADEs) from diverse text sources, such as medical literature, clinical notes, and drug labels. Unfortunately, this task is hindered by factors including variations in the terminologies of drugs and outcomes, and ADE descriptions often being buried in large amounts of narrative text. We present MALADE, the first effective collaborative multi-agent system powered by LLM with Retrieval Augmented Generation for ADE extraction from drug label data. This technique involves augmenting a query to an LLM with relevant information extracted from text resources, and instructing the LLM to compose a response consistent with the augmented data. MALADE is a general LLM-agnostic architecture, and its unique capabilities are: (1) leveraging a variety of external sources, such as medical literature, drug labels, and FDA tools (e.g., OpenFDA drug information API), (2) extracting drug-outcome association in a structured format along with the strength of the association, and (3) providing explanations for established associations. Instantiated with GPT-4 Turbo or GPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an Area Under ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Our implementation leverages the Langroid multi-agent LLM framework and can be found at https://github.com/jihyechoi77/malade.

MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance

TL;DR

MALADE presents a pioneering multi-agent framework for ADE extraction from FDA drug labels using retrieval-augmented generation. By decomposing the task into three specialized agents (DrugFinder, DrugAgent, CategoryAgent) and pairing each with a Critic, the system grounds reasoning in external sources, delivers structured, justification-rich outputs, and achieves strong OMOP-based evaluation (AUC up to ). The agent–critic pattern, coupled with RAG and careful task decomposition, yields reliable, interpretable results and a generalizable blueprint for trustworthy medical AI in pharmacovigilance. The work demonstrates the feasibility and benefits of orchestrating LLM-powered agents for high-stakes clinical knowledge synthesis, with open-source tooling and clear pathways for extension to broader PhV tasks.

Abstract

In the era of Large Language Models (LLMs), given their remarkable text understanding and generation abilities, there is an unprecedented opportunity to develop new, LLM-based methods for trustworthy medical knowledge synthesis, extraction and summarization. This paper focuses on the problem of Pharmacovigilance (PhV), where the significance and challenges lie in identifying Adverse Drug Events (ADEs) from diverse text sources, such as medical literature, clinical notes, and drug labels. Unfortunately, this task is hindered by factors including variations in the terminologies of drugs and outcomes, and ADE descriptions often being buried in large amounts of narrative text. We present MALADE, the first effective collaborative multi-agent system powered by LLM with Retrieval Augmented Generation for ADE extraction from drug label data. This technique involves augmenting a query to an LLM with relevant information extracted from text resources, and instructing the LLM to compose a response consistent with the augmented data. MALADE is a general LLM-agnostic architecture, and its unique capabilities are: (1) leveraging a variety of external sources, such as medical literature, drug labels, and FDA tools (e.g., OpenFDA drug information API), (2) extracting drug-outcome association in a structured format along with the strength of the association, and (3) providing explanations for established associations. Instantiated with GPT-4 Turbo or GPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an Area Under ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Our implementation leverages the Langroid multi-agent LLM framework and can be found at https://github.com/jihyechoi77/malade.
Paper Structure (57 sections, 14 figures, 11 tables)

This paper contains 57 sections, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Real-world demonstration of our proposed multi-agent orchestration system, MALADE. Handling the user query, "Are ACE Inhibitors associated with Angioedema?", involves a sequence of subtasks performed by three Agents: DrugFinder, DrugAgent, CategoryAgent (each instantiated with GPT-4 Turbo or GPT-4o). Each Agent generates a response and justification, which are validated by a corresponding Critic agent, whose feedback is used by the Agent to revise its response.
  • Figure 2: Example of how iteration among responder methods works when a task T has sub-tasks [T1, T2] and T1 has a sub-task T3.
  • Figure 3: Real-world demonstration of Agent-Critic interactions in MALADE. Given the question of identifying the association between Benzodiazepines and Hip Fracture, we illustrate how CategoryAgent corrects its answers over iterations until the paired Critic is satisfied. See Appendix \ref{['app:agent-critic']} for full prompts between the two agents. Agents are instantiated using GPT-4 Turbo.
  • Figure 4: Ground truth (left) vs. predictions by MALADE (right) for OMOP ADE task. Red, green, and white cells represent "increase", "decrease", and "no-effect" labels, respectively.
  • Figure 5: Confusion matrix for MALADE.
  • ...and 9 more figures