Table of Contents
Fetching ...

MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis

Chunzheng Zhu, Yangfang Lin, Shen Chen, Yijun Wang, Jianxin Lin

TL;DR

MedEyes tackles the shortcomings of purely on-policy medical visual-CoT by introducing structured off-policy expert trajectories to guide progressive visual reasoning. It couples a gaze-guided reasoning navigator (GRN) with a confidence value sampler (CVS) to emulate clinician search patterns, and uses a dual-stream GRPO objective to balance imitation and autonomous discovery. Across five medical VQA benchmarks, MedEyes achieves state-of-the-art accuracy and improved visual grounding, validating the value of off-policy guidance for interpretable diagnostic reasoning. The approach advances trustworthy medical AI by aligning reasoning steps with explicit image regions, enabling progressive, region-aware diagnosis in radiology and pathology contexts.

Abstract

Accurate medical diagnosis often involves progressive visual focusing and iterative reasoning, characteristics commonly observed in clinical workflows. While recent vision-language models demonstrate promising chain-of-thought (CoT) reasoning capabilities via reinforcement learning with verifiable rewards (RLVR), their purely on-policy learning paradigm tends to reinforce superficially coherent but clinically inaccurate reasoning paths. We propose MedEyes, a novel reinforcement learning framework that dynamically models clinician-style diagnostic reasoning by progressively attending to and interpreting relevant medical image regions. By incorporating off-policy expert guidance, MedEyes converts expert visual search trajectories into structured external behavioral signals, guiding the model toward clinically aligned visual reasoning. We design the Gaze-guided Reasoning Navigator (GRN) to emulate the diagnostic process through a dual-mode exploration strategy, scanning for systematic abnormality localization and drilling for detailed regional analysis. To balance expert imitation and autonomous discovery, we introduce the Confidence Value Sampler (CVS), which employs nucleus sampling and adaptive termination to create diverse yet credible exploration paths. Finally, the dual-stream GRPO optimization framework decouples on-policy and off-policy learning signals, mitigating reward assimilation and entropy collapse. Experiments demonstrate that MedEyes achieves an average performance improvement of +8.5\% across multiple medical VQA benchmarks, validating MedEyes's potential in building interpretable medical AI systems.

MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis

TL;DR

MedEyes tackles the shortcomings of purely on-policy medical visual-CoT by introducing structured off-policy expert trajectories to guide progressive visual reasoning. It couples a gaze-guided reasoning navigator (GRN) with a confidence value sampler (CVS) to emulate clinician search patterns, and uses a dual-stream GRPO objective to balance imitation and autonomous discovery. Across five medical VQA benchmarks, MedEyes achieves state-of-the-art accuracy and improved visual grounding, validating the value of off-policy guidance for interpretable diagnostic reasoning. The approach advances trustworthy medical AI by aligning reasoning steps with explicit image regions, enabling progressive, region-aware diagnosis in radiology and pathology contexts.

Abstract

Accurate medical diagnosis often involves progressive visual focusing and iterative reasoning, characteristics commonly observed in clinical workflows. While recent vision-language models demonstrate promising chain-of-thought (CoT) reasoning capabilities via reinforcement learning with verifiable rewards (RLVR), their purely on-policy learning paradigm tends to reinforce superficially coherent but clinically inaccurate reasoning paths. We propose MedEyes, a novel reinforcement learning framework that dynamically models clinician-style diagnostic reasoning by progressively attending to and interpreting relevant medical image regions. By incorporating off-policy expert guidance, MedEyes converts expert visual search trajectories into structured external behavioral signals, guiding the model toward clinically aligned visual reasoning. We design the Gaze-guided Reasoning Navigator (GRN) to emulate the diagnostic process through a dual-mode exploration strategy, scanning for systematic abnormality localization and drilling for detailed regional analysis. To balance expert imitation and autonomous discovery, we introduce the Confidence Value Sampler (CVS), which employs nucleus sampling and adaptive termination to create diverse yet credible exploration paths. Finally, the dual-stream GRPO optimization framework decouples on-policy and off-policy learning signals, mitigating reward assimilation and entropy collapse. Experiments demonstrate that MedEyes achieves an average performance improvement of +8.5\% across multiple medical VQA benchmarks, validating MedEyes's potential in building interpretable medical AI systems.

Paper Structure

This paper contains 26 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of medical CoT training paradigms: SFT produces overly generic responses that miss critical findings; on-policy CoT allows exploration but suffers from advantage collapse leading to incorrect reasoning; MedEyes achieves accurate pneumothorax identification through systematic visual grounding and targeted regional analysis.
  • Figure 2: Overview of MedEyes. We first generate structured off-policy expert trajectories through the Gaze-guided Reasoning Navigator (GRN) and Confidence Value Sampler (CVS) to explore expert reasoning patterns, then combine these with on-policy rollouts from the policy model. The unified trajectory is subsequently for optimization via dual-stream GRPO with multi-component verifiable rewards to enhance the model's intrinsic grounding reasoning capability for medical visual understanding.
  • Figure 3: Diagnostic chain-of-thought example of MedEyes. Step 1 identifies bilateral kidneys as anatomical landmarks, followed by targeted liver in step 2. Heatmaps illustrate the progressive refinement process of visual attention.
  • Figure 4: Training dynamics of MedEyes. (a) Reward progression highlighting the effectiveness of off-policy expert guidance. (b) Trajectory length showing exploration-efficiency transition in multi-round visual reasoning.
  • Figure 5: Failure cases analysis. quantitative measurement errors in tumor sizing (top) and pathological concept misinterpretation in ultrasound aneurysm identification (bottom).