Pathology-Aware Prototype Evolution via LLM-Driven Semantic Disambiguation for Multicenter Diabetic Retinopathy Diagnosis
Chunzheng Zhu, Yangfang Lin, Jialin Shao, Jianxin Lin, Yijun Wang
TL;DR
This work tackles semantic and domain-shift challenges in diabetic retinopathy grading by evolving visual prototypes with pathology-informed semantics. It introduces the Hierarchical Anchor Prototype Modulation (HAPM) framework, combining a variance-spectrum anchor library, hierarchical dynamic prompt gating across LVLM/LLM sources, and a two-stage modulation (PSI and DPE) to inject detailed pathological knowledge into prototypes. Extensive cross-domain experiments across eight public DR datasets show HAPM achieving state-of-the-art performance, particularly in distinguishing adjacent DR grades and border cases, while maintaining a frozen backbone for robustness. The approach enhances interpretability by aligning textual pathology descriptors with visual features, offering a clinically meaningful pathway for robust, cross-domain DR diagnosis.
Abstract
Diabetic retinopathy (DR) grading plays a critical role in early clinical intervention and vision preservation. Recent explorations predominantly focus on visual lesion feature extraction through data processing and domain decoupling strategies. However, they generally overlook domain-invariant pathological patterns and underutilize the rich contextual knowledge of foundation models, relying solely on visual information, which is insufficient for distinguishing subtle pathological variations. Therefore, we propose integrating fine-grained pathological descriptions to complement prototypes with additional context, thereby resolving ambiguities in borderline cases. Specifically, we propose a Hierarchical Anchor Prototype Modulation (HAPM) framework to facilitate DR grading. First, we introduce a variance spectrum-driven anchor prototype library that preserves domain-invariant pathological patterns. We further employ a hierarchical differential prompt gating mechanism, dynamically selecting discriminative semantic prompts from both LVLM and LLM sources to address semantic confusion between adjacent DR grades. Finally, we utilize a two-stage prototype modulation strategy that progressively integrates clinical knowledge into visual prototypes through a Pathological Semantic Injector (PSI) and a Discriminative Prototype Enhancer (DPE). Extensive experiments across eight public datasets demonstrate that our approach achieves pathology-guided prototype evolution while outperforming state-of-the-art methods. The code is available at https://github.com/zhcz328/HAPM.
