Table of Contents
Fetching ...

MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models

Siqi Ma, Jiajie Huang, Fan Zhang, Jinlin Wu, Yue Shen, Guohui Fan, Zhu Zhang, Zelin Zang

TL;DR

MedLA tackles trustworthy medical reasoning with large language models by introducing a logic-driven multi-agent framework that encodes reasoning as explicit syllogism-based trees. The system decomposes queries into major and minor premises, delegates subproblems to specialized agents, and conducts multi-round, graph-guided discussions to align and refine the reasoning structure, aided by a credibility module. It achieves state-of-the-art performance on MedDDx and standard medical QA benchmarks, scales across open-source and commercial backbones, and provides interpretable, premise-level traceability without requiring retrieval or fine-tuning. The approach enhances reliability, detectability of inconsistencies, and robustness in clinical decision support, offering a generalizable paradigm for trustworthy medical AI reasoning.

Abstract

Answering complex medical questions requires not only domain expertise and patient-specific information, but also structured and multi-perspective reasoning. Existing multi-agent approaches often rely on fixed roles or shallow interaction prompts, limiting their ability to detect and resolve fine-grained logical inconsistencies. To address this, we propose \textsc{MedLA}, a logic-driven multi-agent framework built on large language models. Each agent organizes its reasoning process into an explicit logical tree based on syllogistic triads (major premise, minor premise, and conclusion), enabling transparent inference and premise-level alignment. Agents engage in a multi-round, graph-guided discussion to compare and iteratively refine their logic trees, achieving consensus through error correction and contradiction resolution. We demonstrate that \textsc{MedLA} consistently outperforms both static role-based systems and single-agent baselines on challenging benchmarks such as MedDDx and standard medical QA tasks. Furthermore, \textsc{MedLA} scales effectively across both open-source and commercial LLM backbones, achieving state-of-the-art performance and offering a generalizable paradigm for trustworthy medical reasoning.

MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models

TL;DR

MedLA tackles trustworthy medical reasoning with large language models by introducing a logic-driven multi-agent framework that encodes reasoning as explicit syllogism-based trees. The system decomposes queries into major and minor premises, delegates subproblems to specialized agents, and conducts multi-round, graph-guided discussions to align and refine the reasoning structure, aided by a credibility module. It achieves state-of-the-art performance on MedDDx and standard medical QA benchmarks, scales across open-source and commercial backbones, and provides interpretable, premise-level traceability without requiring retrieval or fine-tuning. The approach enhances reliability, detectability of inconsistencies, and robustness in clinical decision support, offering a generalizable paradigm for trustworthy medical AI reasoning.

Abstract

Answering complex medical questions requires not only domain expertise and patient-specific information, but also structured and multi-perspective reasoning. Existing multi-agent approaches often rely on fixed roles or shallow interaction prompts, limiting their ability to detect and resolve fine-grained logical inconsistencies. To address this, we propose \textsc{MedLA}, a logic-driven multi-agent framework built on large language models. Each agent organizes its reasoning process into an explicit logical tree based on syllogistic triads (major premise, minor premise, and conclusion), enabling transparent inference and premise-level alignment. Agents engage in a multi-round, graph-guided discussion to compare and iteratively refine their logic trees, achieving consensus through error correction and contradiction resolution. We demonstrate that \textsc{MedLA} consistently outperforms both static role-based systems and single-agent baselines on challenging benchmarks such as MedDDx and standard medical QA tasks. Furthermore, \textsc{MedLA} scales effectively across both open-source and commercial LLM backbones, achieving state-of-the-art performance and offering a generalizable paradigm for trustworthy medical reasoning.

Paper Structure

This paper contains 13 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: (a) Comparison between traditional role-based agent discussions and our proposed logic-based framework. (b) Performance and parameter comparison of MedLA with existing systems. (a-Left) Traditional systems (e.g., MedAgent) assign agents fixed roles and aggregate their conclusions, leading to superficial discussions and difficulty identifying the root of disagreement. (a-Right) Our approach models each agent's reasoning as a logic tree, enabling inter-agent analysis of logical and knowledge-based inconsistencies. (b) MedLA outperforms existing systems in the average accuracy of two benchmarks, demonstrating its effectiveness in handling complex medical reasoning tasks.
  • Figure 2: Overview of the proposed MedLA for complex medical reasoning. The system decomposes a medical query into logical sub-tasks, dynamically invokes specialized agents, and engages in collaborative reasoning to generate comprehensive answers.
  • Figure 3: Performance comparison of MedLA with LLaMA3.1-8B at different levels of difficulty on the MedDDx benchmark. Error bars represent SD.
  • Figure 4: MedLA leads to more effective collaboration between agents than a baseline majority-voting method. Accuracy rises steadily with both agent count and temperature.