MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution
Jianwen Chen, Xinyu Yang, Peng Xia, Arian Azarang, Yueh Z Lee, Gang Li, Hongtu Zhu, Yun Li, Beidi Chen, Huaxiu Yao
TL;DR
MedVerse redefines medical reasoning as a DAG-structured process executable via Petri nets, enabling parallel exploration of differential diagnoses while preserving causal order. The framework combines a data-curation pipeline (MedVerse Curator), a topology-aware MedVerse Attention, and a parallel inference engine (MedVerse Engine) to deliver significant accuracy improvements and substantial throughput and latency benefits over autoregressive baselines. Empirical results show up to 8.9% accuracy gains and about 1.3x faster latency with roughly 1.7x higher generation throughput on standard medical benchmarks, highlighting practical potential for real-world clinical decision support. The work demonstrates a principled, end-to-end, medically grounded approach to scalable, reliable medical reasoning with open-source code availability.
Abstract
Large language models (LLMs) have demonstrated strong performance and rapid progress in a wide range of medical reasoning tasks. However, their sequential autoregressive decoding forces inherently parallel clinical reasoning, such as differential diagnosis, into a single linear reasoning path, limiting both efficiency and reliability for complex medical problems. To address this, we propose MedVerse, a reasoning framework for complex medical inference that reformulates medical reasoning as a parallelizable directed acyclic graph (DAG) process based on Petri net theory. The framework adopts a full-stack design across data, model architecture, and system execution. For data creation, we introduce the MedVerse Curator, an automated pipeline that synthesizes knowledge-grounded medical reasoning paths and transforms them into Petri net-structured representations. At the architectural level, we propose a topology-aware attention mechanism with adaptive position indices that supports parallel reasoning while preserving logical consistency. Systematically, we develop a customized inference engine that supports parallel execution without additional overhead. Empirical evaluations show that MedVerse improves strong general-purpose LLMs by up to 8.9%. Compared to specialized medical LLMs, MedVerse achieves comparable performance while delivering a 1.3x reduction in inference latency and a 1.7x increase in generation throughput, enabled by its parallel decoding capability. Code is available at https://github.com/aiming-lab/MedVerse.
