Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks

Wanrong Yang; Zhengliang Liu; Yuan Li; Bingjie Yan; Lingfang Li; Mingguang He; Dominik Wojtczak; Yalin Zheng; Danli Shi

Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks

Wanrong Yang, Zhengliang Liu, Yuan Li, Bingjie Yan, Lingfang Li, Mingguang He, Dominik Wojtczak, Yalin Zheng, Danli Shi

TL;DR

This study demonstrates that equipping models with a dynamically updatable, state-enhanced cognitive scaffold is a privacy-preserving and computationally efficient pathway for local adaptation of AI agents to clinical information systems.

Abstract

While Large Language Models demonstrate immense potential as proactive Medical Agents, their real-world deployment is severely bottlenecked by data scarcity under privacy constraints. To overcome this, we propose State-Enhanced Logical-Skill Memory (SELSM), a training-free framework that distills simulated clinical trajectories into entity-agnostic operational rules within an abstract skill space. During inference, a Query-Anchored Two-Stage Retrieval mechanism dynamically fetches these entity-agnostic logical priors to guide the agent's step-by-step reasoning, effectively resolving the state polysemy problem. Evaluated on MedAgentBench -- the only authoritative high-fidelity virtual EHR sandbox benchmarked with real clinical data -- SELSM substantially elevates the zero-shot capabilities of locally deployable foundation models (30B--32B parameters). Notably, on the Qwen3-30B-A3B backbone, our framework completely eliminates task chain breakdowns to achieve a 100\% completion rate, boosting the overall success rate by an absolute 22.67\% and significantly outperforming existing memory-augmented baselines. This study demonstrates that equipping models with a dynamically updatable, state-enhanced cognitive scaffold is a privacy-preserving and computationally efficient pathway for local adaptation of AI agents to clinical information systems. While currently validated on FHIR-based EHR interactions as an initial step, the entity-agnostic design of SELSM provides a principled foundation toward broader clinical deployment.

Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks

TL;DR

Abstract

Paper Structure (26 sections, 17 equations, 6 figures, 3 tables)

This paper contains 26 sections, 17 equations, 6 figures, 3 tables.

Introduction
Methods
Problem Formulation
Phase 1: Logical Skill Distillation
Agent--Environment Interaction Loop
LLM-as-Judge Step-Level Evaluation
Memory Record Construction
Phase 2: Hierarchical Memory Indexing
Phase 3: Query-Anchored Two-Stage Retrieval and Injection
Stage 1: Task-Level Filtering
Stage 2: Transition-Level Ranking
Skill Injection
Evaluation Metrics
Primary Metrics
Derived Metrics
...and 11 more sections

Figures (6)

Figure 1: Overview of the SELSM framework. (Top-left) Cross-institutional deployment context: heterogeneous hospital systems (EHR, LIS, PACS, etc.) operate under institution-specific protocols, while a locally deployed LLM, guided by medical professionals, issues privacy-preserving operational queries. (Top-right)Logical Skill Distillation (Phase 1): the agent interacts with a system simulator in a closed loop, where it observes the current state $s$, executes an action $a$ via the policy $p = \pi(a \mid s)$, and receives the system response $o$. The Logical Skill Generator $\mathcal{G}$ parses each trajectory $\tau = (s, a, o)$ through a context encoder and experience decoder to produce entity-agnostic logical skills $e = \mathcal{G}(\tau \mid \theta)$, comprising Operational Logic, canonical examples, and reasoning traces. (Bottom)Query-Anchored Two-Stage Retrieval (Phase 3): upon receiving a new task, the system first performs Task-Level Query Filtering by scoring stored records $\mathcal{R} = (q, \mathcal{T})$ via query similarity $p_q$, then executes Transition-Level State Ranking over candidate internal states via state similarity $p_s$, and finally integrates the retrieved skill with the current state to guide the agent toward a better operational action.
Figure 2: Failure mode distribution and performance improvement analysis. (a)--(c) Each stacked bar decomposes all 300 tasks into four mutually exclusive outcomes: Correct (task completed with the correct answer), Incorrect (task completed but with a wrong answer), Invalid Action (terminated due to an invalid API call), and Task Limit (terminated after exceeding the maximum number of interaction turns) for GLM4-32B, Qwen3-30B-A3B, and Qwen3-32B, respectively. (d) Absolute improvement of our method over the Baseline in percentage points (pp) across three metrics: Overall Success Rate, Query Success Rate, and Action Success Rate.
Figure 3: Conversation efficiency analysis. (a)--(c) One-Shot Correct Rate (OSR; Eq. \ref{['eq:osr']}) for each method on GLM4-32B, Qwen3-30B-A3B, and Qwen3-32B, respectively. (d) Average number of conversation turns per task across all three backbone models.
Figure 4: Multi-dimensional comparison of four methods on the Qwen3-30B-A3B backbone. Five normalized dimensions are shown: Success Rate ($\text{SR}/100$; Eq. \ref{['eq:sr']}), Completion Rate ($\text{TC}/100$; Eq. \ref{['eq:tc']}), Error Robustness (ER; Eq. \ref{['eq:er']}), Efficiency ($\text{OSR}/100$; Eq. \ref{['eq:osr']}), and Query-Action Balance (QAB; Eq. \ref{['eq:qab']}). A larger enclosed area indicates better overall performance.
Figure 5: Token efficiency analysis. Each point represents one method--model configuration, with the $x$-axis showing $\bar{C}_{\mathrm{tok}}$ (Eq. \ref{['eq:token_cost']}) and the $y$-axis showing SR (Eq. \ref{['eq:sr']}). Colors denote methods (Baseline, A-Mem, ExpeL, Ours) and marker shapes denote backbone models (square: GLM4-32B, circle: Qwen3-30B-A3B, triangle: Qwen3-32B). Points closer to the upper-left corner indicate higher accuracy at lower token cost.
...and 1 more figures

Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks

TL;DR

Abstract

Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)