Table of Contents
Fetching ...

De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy

Moritz Rempe, Lukas Heine, Constantin Seibold, Fabian Hörst, Jens Kleesiek

TL;DR

An open-source tool that automates the de-identification of various medical imaging formats, enhancing the efficiency of de-identification processes and addressing the critical need for robust and user-friendly de-identification solutions in medical imaging.

Abstract

Medical data employed in research frequently comprises sensitive patient health information (PHI), which is subject to rigorous legal frameworks such as the General Data Protection Regulation (GDPR) or the Health Insurance Portability and Accountability Act (HIPAA). Consequently, these types of data must be pseudonymized prior to utilisation, which presents a significant challenge for many researchers. Given the vast array of medical data, it is necessary to employ a variety of de-identification techniques. To facilitate the anonymization process for medical imaging data, we have developed an open-source tool that can be used to de-identify DICOM magnetic resonance images, computer tomography images, whole slide images and magnetic resonance twix raw data. Furthermore, the implementation of a neural network enables the removal of text within the images. The proposed tool automates an elaborate anonymization pipeline for multiple types of inputs, reducing the need for additional tools used for de-identification of imaging data. We make our code publicly available at https://github.com/code-lukas/medical_image_deidentification.

De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy

TL;DR

An open-source tool that automates the de-identification of various medical imaging formats, enhancing the efficiency of de-identification processes and addressing the critical need for robust and user-friendly de-identification solutions in medical imaging.

Abstract

Medical data employed in research frequently comprises sensitive patient health information (PHI), which is subject to rigorous legal frameworks such as the General Data Protection Regulation (GDPR) or the Health Insurance Portability and Accountability Act (HIPAA). Consequently, these types of data must be pseudonymized prior to utilisation, which presents a significant challenge for many researchers. Given the vast array of medical data, it is necessary to employ a variety of de-identification techniques. To facilitate the anonymization process for medical imaging data, we have developed an open-source tool that can be used to de-identify DICOM magnetic resonance images, computer tomography images, whole slide images and magnetic resonance twix raw data. Furthermore, the implementation of a neural network enables the removal of text within the images. The proposed tool automates an elaborate anonymization pipeline for multiple types of inputs, reducing the need for additional tools used for de-identification of imaging data. We make our code publicly available at https://github.com/code-lukas/medical_image_deidentification.

Paper Structure

This paper contains 17 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the proposed de-identification tool. The input data is first read, specific to its data type. Different optional anonymization steps can be performed. Metadata removal or pixel data cleaning, including skull-stripping, defacing or text removal can be performed for all common medical datatypes.
  • Figure 2: Exemplary excerpt of a DICOM de-identification profile (left) and part of an exemplary anonymized twix header (right). All patient related information are anonymized by either replacing the values with zeros or ’x’.
  • Figure 3: Comparison of the defacing results of different defacing algorithms. While the result of pydeface and the proposed algorithm are similar, pydeface additionally cuts off the shoulder region of the scan, while taking 260 times longer on average than the proposed algorithm.
  • Figure 4: Proposed text removal pipeline at the example of a ultrasound image. By first inserting a rectangle in the center of the scan, tesseract focuses on the text on the side of the image. Possible texts in the middle of the image is then removed in further iterations.
  • Figure 5: Computation time for defacing and skull-stripping of the compared methods on the Synthstrip test-dataset. The proposed method is faster than the compared state-of-the art algorithms. The y-axis is scaled logarithmically for better visibility. The bold lines inside the plots depict the median value.
  • ...and 1 more figures