MediSee: Reasoning-based Pixel-level Perception in Medical Images
Qinyue Tong, Ziqian Lu, Jun Liu, Yangming Zheng, Zheming Lu
TL;DR
This work defines Medical Reasoning Segmentation and Detection (MedSD), enabling segmentation and detection from implicit, knowledge-driven medical queries. It introduces the MLMR-SD dataset (over 200K QA pairs and 12,652 image-mask pairs across 109 medical objects) and MediSee, a baseline that fuses multiple candidate token features via Adaptive Democratic Candidate Fusion and uses similarity-map supervision to enhance reasoning. Across MLMR-SD and traditional SA-Med2D benchmarks, MediSee achieves superior segmentation and detection performance while providing textual explanations, addressing interactivity gaps in medical perception. The approach advances interactive, reasoning-based medical image understanding with potential clinical and research impact.
Abstract
Despite remarkable advancements in pixel-level medical image perception, existing methods are either limited to specific tasks or heavily rely on accurate bounding boxes or text labels as input prompts. However, the medical knowledge required for input is a huge obstacle for general public, which greatly reduces the universality of these methods. Compared with these domain-specialized auxiliary information, general users tend to rely on oral queries that require logical reasoning. In this paper, we introduce a novel medical vision task: Medical Reasoning Segmentation and Detection (MedSD), which aims to comprehend implicit queries about medical images and generate the corresponding segmentation mask and bounding box for the target object. To accomplish this task, we first introduce a Multi-perspective, Logic-driven Medical Reasoning Segmentation and Detection (MLMR-SD) dataset, which encompasses a substantial collection of medical entity targets along with their corresponding reasoning. Furthermore, we propose MediSee, an effective baseline model designed for medical reasoning segmentation and detection. The experimental results indicate that the proposed method can effectively address MedSD with implicit colloquial queries and outperform traditional medical referring segmentation methods.
