Table of Contents
Fetching ...

ProcureGym: A Multi-Agent Markov Game Framework for Modeling National Volume-based Drug Procurement

Jia Wang, Qian Xu, Xuanwen Ding, Zhuangqi Li, Chao He, Bao Liu, Zhongyu Wei

Abstract

In this paper, we introduce ProcureGym, an data-driven multi-agent simulation platform that models China's National Volume-Based drug Procurement (NVBP) as a Markov Game. Based on real-world data from 7 rounds of NVBP (covering 325 drugs and 2,267 firms), the platform establishes a high-fidelity simulation environment. Within this framework, we evaluate diverse agent models, including Reinforcement Learning (RL), Large Language Model (LLM), and Rule-based algorithms. Experimental results demonstrate that RL agents achieve superior winner alignment and profits. Further analyses show that maximum valid bidding price and procurement volume dominate strategic outcomes. ProcureGym thus serves as a rigorous instrument for assessing policy impacts and formulating future procurement strategies.

ProcureGym: A Multi-Agent Markov Game Framework for Modeling National Volume-based Drug Procurement

Abstract

In this paper, we introduce ProcureGym, an data-driven multi-agent simulation platform that models China's National Volume-Based drug Procurement (NVBP) as a Markov Game. Based on real-world data from 7 rounds of NVBP (covering 325 drugs and 2,267 firms), the platform establishes a high-fidelity simulation environment. Within this framework, we evaluate diverse agent models, including Reinforcement Learning (RL), Large Language Model (LLM), and Rule-based algorithms. Experimental results demonstrate that RL agents achieve superior winner alignment and profits. Further analyses show that maximum valid bidding price and procurement volume dominate strategic outcomes. ProcureGym thus serves as a rigorous instrument for assessing policy impacts and formulating future procurement strategies.
Paper Structure (29 sections, 13 figures, 7 tables)

This paper contains 29 sections, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Overview of the ProcureGym Framework.
  • Figure 2: Characteristics of the research dataset. (A-C) Drug characteristics: dosage forms, Anatomical Therapeutic Chemical (ATC) categories, and drugs by procurement round. (D-F) Competition: potential bidders, winners per drug, and winning rates. (G–J) Enterprise attributes:enterprise type, originator versus generic status, in-house active pharmaceutical ingredient production, and the distribution of log-transformed bid prices by enterprise type.
  • Figure 3: Evaluation of NVBP Simulation across 7 Rounds (Rounds 2-9, excluding Round 6(insulin-focused)). (A) Price Prediction Accuracy: Log-log scatter plot of predicted vs. actual bid prices (unit: CNY, China Yuan); bubble size = number of firms per drug. Lowess smoothing curves with 95% confidence bands visualize trends; the black diagonal line ($y=x$) represents perfect prediction. (B) Selection Prediction Accuracy: Batch-wise winner alignment rate; Top-K lowest-price ranking predictions vs. actual outcomes. (C) Firm Profit (unit: CNY thousand): Log-scale profit distribution.
  • Figure 4: Sensitivity Analysis Results . Predicted bidding price and firm profit under varying: (A,B) argeed procurement ratios ($\rho$); (C,D) maximum valid bidding prices ($P_{max}$) (unit: CNY); (E,F) agreed procurement volume ($Q_0$) (unit: $10^3$ dosage units); (G,H) actual procurement volume ($Q_e$) (unit: $10^3$ dosage units); (I,J) unit production costs ($C_i$) (unit: CNY). Colored lines: four algorithms; shaded areas: 95% confidence intervals.
  • Figure 5: Comparative evaluation of decision-making performance between GPT-5.4 and Qwen3-235B-A22B-Thinking-2507. The experimental analysis concurrently evaluates the seven drugs listed in Table 2. The left panel illustrates the predicted selection rate, while the right panel depicts firm profit.
  • ...and 8 more figures