Table of Contents
Fetching ...

IRFundusSet: An Integrated Retinal Fundus Dataset with a Harmonized Healthy Label

P. Bilha Githinji, Keming Zhao, Jiantao Wang, Peiwu Qin

TL;DR

This work presents Integrated Retinal Fundus Set (IRFundusSet), a dataset that consolidates, harmonizes and curates several public datasets, facilitating their consumption as a unified whole and with a consistent is_normal label.

Abstract

Ocular conditions are a global concern and computational tools utilizing retinal fundus color photographs can aid in routine screening and management. Obtaining comprehensive and sufficiently sized datasets, however, is non-trivial for the intricate retinal fundus, which exhibits heterogeneities within pathologies, in addition to variations from demographics and acquisition. Moreover, retinal fundus datasets in the public space suffer fragmentation in the organization of data and definition of a healthy observation. We present Integrated Retinal Fundus Set (IRFundusSet), a dataset that consolidates, harmonizes and curates several public datasets, facilitating their consumption as a unified whole and with a consistent is_normal label. IRFundusSet comprises a Python package that automates harmonization and avails a dataset object in line with the PyTorch approach. Moreover, images are physically reviewed and a new is_normal label is annotated for a consistent definition of a healthy observation. Ten public datasets are initially considered with a total of 46064 images, of which 25406 are curated for a new is_normal label and 3515 are deemed healthy across the sources.

IRFundusSet: An Integrated Retinal Fundus Dataset with a Harmonized Healthy Label

TL;DR

This work presents Integrated Retinal Fundus Set (IRFundusSet), a dataset that consolidates, harmonizes and curates several public datasets, facilitating their consumption as a unified whole and with a consistent is_normal label.

Abstract

Ocular conditions are a global concern and computational tools utilizing retinal fundus color photographs can aid in routine screening and management. Obtaining comprehensive and sufficiently sized datasets, however, is non-trivial for the intricate retinal fundus, which exhibits heterogeneities within pathologies, in addition to variations from demographics and acquisition. Moreover, retinal fundus datasets in the public space suffer fragmentation in the organization of data and definition of a healthy observation. We present Integrated Retinal Fundus Set (IRFundusSet), a dataset that consolidates, harmonizes and curates several public datasets, facilitating their consumption as a unified whole and with a consistent is_normal label. IRFundusSet comprises a Python package that automates harmonization and avails a dataset object in line with the PyTorch approach. Moreover, images are physically reviewed and a new is_normal label is annotated for a consistent definition of a healthy observation. Ten public datasets are initially considered with a total of 46064 images, of which 25406 are curated for a new is_normal label and 3515 are deemed healthy across the sources.
Paper Structure (15 sections, 2 figures, 3 tables)

This paper contains 15 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Flow chart depicting the process for curating is_normal label
  • Figure 2: The first plot summarizes the aspect ratio of the images. The second plot is a t-SNE visualization of the harmonized dataset exploring for variations that isolate the newly curated is_normal label