BCN20000: Dermoscopic Lesions in the Wild
Marc Combalia, Noel C. F. Codella, Veronica Rotemberg, Brian Helba, Veronica Vilaplana, Ofer Reiter, Cristina Carrera, Alicia Barreiro, Allan C. Halpern, Susana Puig, Josep Malvehy
TL;DR
The work presents BCN20000, a large clinically sourced dermoscopic image dataset designed to enable unconstrained skin-lesion classification, including challenging contexts like nails, mucosa, large, and hypopigmented lesions. It combines systematic data collection across 2010–2016, automated organization and filtering with manual validation, and linkage to histopathological diagnoses and patient metadata. The dataset, backed by ethics approval, targets cross-source generalization and practical benchmarking for ISIC 2019 challenges, facilitating more robust dermatology AI systems. Overall, BCN20000 aims to better reflect real-world clinical practice and drive improved performance on diverse dermoscopic data.
Abstract
This article summarizes the BCN20000 dataset, composed of 19424 dermoscopic images of skin lesions captured from 2010 to 2016 in the facilities of the Hospital Clínic in Barcelona. With this dataset, we aim to study the problem of unconstrained classification of dermoscopic images of skin cancer, including lesions found in hard-to-diagnose locations (nails and mucosa), large lesions which do not fit in the aperture of the dermoscopy device, and hypo-pigmented lesions. The BCN20000 will be provided to the participants of the ISIC Challenge 2019, where they will be asked to train algorithms to classify dermoscopic images of skin cancer automatically.
