EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement

Wenhui Zhu; Xuanzhao Dong; Xin Li; Yujian Xiong; Xiwen Chen; Peijie Qiu; Vamsi Krishna Vasa; Zhangsihao Yang; Yi Su; Oana Dumitrascu; Yalin Wang

EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement

Wenhui Zhu, Xuanzhao Dong, Xin Li, Yujian Xiong, Xiwen Chen, Peijie Qiu, Vamsi Krishna Vasa, Zhangsihao Yang, Yi Su, Oana Dumitrascu, Yalin Wang

TL;DR

EyeBench addresses a critical gap in evaluating retinal fundus image enhancement by introducing a multi-dimensional benchmark that jointly considers full-reference and no-reference quality, plus clinically meaningful downstream tasks. It combines distribution-aligned datasets, expert-guided annotations, and multi-task evaluation to assess how well enhancement methods preserve vessels, lesions, and disease-related information. The study finds that multi-dimensional assessments better reflect clinical preferences than single-metric evaluations, and reveals distinct strengths and trade-offs among paired, unpaired, OT-based, and SDE-based methods. Overall, EyeBench provides a practical framework and insights to guide future development toward clinically relevant retinal image enhancement.

Abstract

Over the past decade, generative models have achieved significant success in enhancement fundus images.However, the evaluation of these models still presents a considerable challenge. A comprehensive evaluation benchmark for fundus image enhancement is indispensable for three main reasons: 1) The existing denoising metrics (e.g., PSNR, SSIM) are hardly to extend to downstream real-world clinical research (e.g., Vessel morphology consistency). 2) There is a lack of comprehensive evaluation for both paired and unpaired enhancement methods, along with the need for expert protocols to accurately assess clinical value. 3) An ideal evaluation system should provide insights to inform future developments of fundus image enhancement. To this end, we propose a novel comprehensive benchmark, EyeBench, to provide insights that align enhancement models with clinical needs, offering a foundation for future work to improve the clinical relevance and applicability of generative models for fundus image enhancement. EyeBench has three appealing properties: 1) multi-dimensional clinical alignment downstream evaluation: In addition to evaluating the enhancement task, we provide several clinically significant downstream tasks for fundus images, including vessel segmentation, DR grading, denoising generalization, and lesion segmentation. 2) Medical expert-guided evaluation design: We introduce a novel dataset that promote comprehensive and fair comparisons between paired and unpaired methods and includes a manual evaluation protocol by medical experts. 3) Valuable insights: Our benchmark study provides a comprehensive and rigorous evaluation of existing methods across different downstream tasks, assisting medical experts in making informed choices. Additionally, we offer further analysis of the challenges faced by existing methods. The code is available at \url{https://github.com/Retinal-Research/EyeBench}

EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement

TL;DR

Abstract

EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)