Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations
Peng Xia, Ming Hu, Feilong Tang, Wenxue Li, Wenhao Zheng, Lie Ju, Peibo Duan, Huaxiu Yao, Zongyuan Ge
TL;DR
Diabetic Retinopathy grading models suffer from domain shifts due to imaging conditions, demographics, and diagnostic criteria, limiting deployment in diverse clinical settings. The authors propose DECO, a disentangled representation framework that separates DR-relevant retinal semantics from domain noise and recombines them across domains to synthesize diverse, domain-invariant features. They further stabilize learning with class prototypes to refine semantic content and domain prototypes to regularize domain noise, using data-aware interpolation, and introduce a robust pixel-level semantic alignment loss to promote dense, intra-class variability while preserving inter-class distinctions. Across comprehensive benchmarks on GDRBench, DECO achieves superior generalization to unseen domains, outperforming state-of-the-art methods, with particular improvements on underrepresented datasets; code is available at the provided GitHub link.
Abstract
Diabetic Retinopathy (DR), induced by diabetes, poses a significant risk of visual impairment. Accurate and effective grading of DR aids in the treatment of this condition. Yet existing models experience notable performance degradation on unseen domains due to domain shifts. Previous methods address this issue by simulating domain style through simple visual transformation and mitigating domain noise via learning robust representations. However, domain shifts encompass more than image styles. They overlook biases caused by implicit factors such as ethnicity, age, and diagnostic criteria. In our work, we propose a novel framework where representations of paired data from different domains are decoupled into semantic features and domain noise. The resulting augmented representation comprises original retinal semantics and domain noise from other domains, aiming to generate enhanced representations aligned with real-world clinical needs, incorporating rich information from diverse domains. Subsequently, to improve the robustness of the decoupled representations, class and domain prototypes are employed to interpolate the disentangled representations while data-aware weights are designed to focus on rare classes and domains. Finally, we devise a robust pixel-level semantic alignment loss to align retinal semantics decoupled from features, maintaining a balance between intra-class diversity and dense class features. Experimental results on multiple benchmarks demonstrate the effectiveness of our method on unseen domains. The code implementations are accessible on https://github.com/richard-peng-xia/DECO.
