Table of Contents
Fetching ...

MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution

Jianwen Chen, Xinyu Yang, Peng Xia, Arian Azarang, Yueh Z Lee, Gang Li, Hongtu Zhu, Yun Li, Beidi Chen, Huaxiu Yao

TL;DR

MedVerse redefines medical reasoning as a DAG-structured process executable via Petri nets, enabling parallel exploration of differential diagnoses while preserving causal order. The framework combines a data-curation pipeline (MedVerse Curator), a topology-aware MedVerse Attention, and a parallel inference engine (MedVerse Engine) to deliver significant accuracy improvements and substantial throughput and latency benefits over autoregressive baselines. Empirical results show up to 8.9% accuracy gains and about 1.3x faster latency with roughly 1.7x higher generation throughput on standard medical benchmarks, highlighting practical potential for real-world clinical decision support. The work demonstrates a principled, end-to-end, medically grounded approach to scalable, reliable medical reasoning with open-source code availability.

Abstract

Large language models (LLMs) have demonstrated strong performance and rapid progress in a wide range of medical reasoning tasks. However, their sequential autoregressive decoding forces inherently parallel clinical reasoning, such as differential diagnosis, into a single linear reasoning path, limiting both efficiency and reliability for complex medical problems. To address this, we propose MedVerse, a reasoning framework for complex medical inference that reformulates medical reasoning as a parallelizable directed acyclic graph (DAG) process based on Petri net theory. The framework adopts a full-stack design across data, model architecture, and system execution. For data creation, we introduce the MedVerse Curator, an automated pipeline that synthesizes knowledge-grounded medical reasoning paths and transforms them into Petri net-structured representations. At the architectural level, we propose a topology-aware attention mechanism with adaptive position indices that supports parallel reasoning while preserving logical consistency. Systematically, we develop a customized inference engine that supports parallel execution without additional overhead. Empirical evaluations show that MedVerse improves strong general-purpose LLMs by up to 8.9%. Compared to specialized medical LLMs, MedVerse achieves comparable performance while delivering a 1.3x reduction in inference latency and a 1.7x increase in generation throughput, enabled by its parallel decoding capability. Code is available at https://github.com/aiming-lab/MedVerse.

MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution

TL;DR

MedVerse redefines medical reasoning as a DAG-structured process executable via Petri nets, enabling parallel exploration of differential diagnoses while preserving causal order. The framework combines a data-curation pipeline (MedVerse Curator), a topology-aware MedVerse Attention, and a parallel inference engine (MedVerse Engine) to deliver significant accuracy improvements and substantial throughput and latency benefits over autoregressive baselines. Empirical results show up to 8.9% accuracy gains and about 1.3x faster latency with roughly 1.7x higher generation throughput on standard medical benchmarks, highlighting practical potential for real-world clinical decision support. The work demonstrates a principled, end-to-end, medically grounded approach to scalable, reliable medical reasoning with open-source code availability.

Abstract

Large language models (LLMs) have demonstrated strong performance and rapid progress in a wide range of medical reasoning tasks. However, their sequential autoregressive decoding forces inherently parallel clinical reasoning, such as differential diagnosis, into a single linear reasoning path, limiting both efficiency and reliability for complex medical problems. To address this, we propose MedVerse, a reasoning framework for complex medical inference that reformulates medical reasoning as a parallelizable directed acyclic graph (DAG) process based on Petri net theory. The framework adopts a full-stack design across data, model architecture, and system execution. For data creation, we introduce the MedVerse Curator, an automated pipeline that synthesizes knowledge-grounded medical reasoning paths and transforms them into Petri net-structured representations. At the architectural level, we propose a topology-aware attention mechanism with adaptive position indices that supports parallel reasoning while preserving logical consistency. Systematically, we develop a customized inference engine that supports parallel execution without additional overhead. Empirical evaluations show that MedVerse improves strong general-purpose LLMs by up to 8.9%. Compared to specialized medical LLMs, MedVerse achieves comparable performance while delivering a 1.3x reduction in inference latency and a 1.7x increase in generation throughput, enabled by its parallel decoding capability. Code is available at https://github.com/aiming-lab/MedVerse.
Paper Structure (33 sections, 3 equations, 4 figures, 4 tables)

This paper contains 33 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Limitations of sequential chain-of-thought reasoning in medical diagnosis. (a) Accuracy: Linear execution suffers from contextual pollution due to early incorrect hypotheses (red), whereas parallel reasoning preserves correct inference paths (green). (b) Efficiency: Sequential reasoning repeatedly processes overlapping evidence, leading to redundant computation (orange). (c) Interpretability: Unstructured chain-of-thought (red) obscures explicit causal dependencies due to a structural mismatch with parallel clinical reasoning.
  • Figure 2: Illustration of the topological modeling process. The framework first extracts a structured clinical topological DAG from an unstructured question-answer pair, capturing causal dependencies. This DAG is then formally mapped into an Executable Petri Net, where reasoning states are instantiated as places and their dependency relations are realized through transitions.
  • Figure 3: Example of the structured generation flow in MedVerse. The reasoning process is explicitly decomposed into planning, execution, and conclusion stages.
  • Figure 4: Efficiency Metrics. (a) Average latency and relative speedup (orange line) across five datasets. MedVerse consistently outperforms the baseline. (b) Throughput vs. Sequence Length. Our method exhibits better scaling properties, maintaining higher throughput as token complexity increases.