Table of Contents
Fetching ...

BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients

Maria de la Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco García-García, Marisa Caparrós, Germán González, Jose María Salinas

TL;DR

This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of CO VID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG).

Abstract

This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings have been mapped onto standard Unified Medical Language System (UMLS) terminology and cover a wide spectrum of thoracic entities, unlike the considerably more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels and stored in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. This is, to the best of our knowledge, the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from http://bimcv.cipf.es/bimcv-projects/bimcv-covid19.

BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients

TL;DR

This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of CO VID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG).

Abstract

This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings have been mapped onto standard Unified Medical Language System (UMLS) terminology and cover a wide spectrum of thoracic entities, unlike the considerably more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels and stored in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. This is, to the best of our knowledge, the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from http://bimcv.cipf.es/bimcv-projects/bimcv-covid19.

Paper Structure

This paper contains 20 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Choropletic map with #CR #DX #CT from first iteration. https://maigva.github.io/maps/HealthDepartCOVID19.html
  • Figure 2: Conceptual scheme of radiological report anonymization.
  • Figure 3: Example annotation at pixel level. In blue, consolidation, red marks ground glass.
  • Figure 4: MIDS structure for BIMCV COVID-19+ dataset. (A) Conceptual schema (B) General Template (C) Example of folder structure.
  • Figure 5: Top: histogram of the patients age. Middle: histogram of the number of studies per subject. Bottom: histogram of the difference (in days) between the image study and the diagnostic test. Please note that for most of the studies, there were less than five days between the radiography and the diagnostic test.
  • ...and 2 more figures