GRPO++: Enhancing Dermatological Reasoning under Low Resource Settings
Ismam Nur Swapnil, Aranya Saha, Tanvir Ahmed Khan, Mohammad Ariful Haque
TL;DR
DermIQ-VLM addresses the challenge of building reliable, explainable dermatology vision-language models under data-scarce, low-resource conditions. It introduces a resource-efficient, multi-stage pipeline combining a stabilized $GRPO++$ reinforcement-learning stage for reasoning, supervised fine-tuning for conversational capability, and grounding via Knowledge Graphs with $DPO$ to internalize reliable patterns. A curated dermatology dataset supports stage-specific training, and experiments show DermIQ-VLM outperforms baselines in disease detection and conversational quality, achieving notable gains with both smaller and larger backbones. This work demonstrates a feasible pathway for deploying specialized, trustworthy VLMs in resource-constrained clinical settings, with potential to improve diagnostic support and clinician trust through grounded, stepwise reasoning.
Abstract
Vision-Language Models (VLMs) show promise in medical image analysis, yet their capacity for structured reasoning in complex domains like dermatology is often limited by data scarcity and the high computational cost of advanced training techniques. To address these challenges, we introduce DermIQ-VLM, a VLM developed through a multi-stage, resource-efficient methodology designed to emulate a dermatologist's diagnostic process. Our primary contribution is a modified version of Grouped Relative Policy Optimization (GRPO), called GRPO++, which stabilizes the powerful but data-intensive GRPO framework. Our proposed training pipeline first employs GRPO++ for reasoning-oriented disease recognition, followed by supervised fine-tuning for conversational ability. To mitigate factual errors introduced during this step, we then align the model using Direct Preference Optimization (DPO), leveraging a Knowledge Graph-based system as a scalable proxy for expert preference. A preliminary evaluation on a curated dermatological dataset demonstrates that our proposed methodology yields notable performance gains over standard fine-tuning approaches. These findings validate the potential of our pipeline as a feasible pathway for developing specialized, reliable VLMs in resource-constrained environments.
