Table of Contents
Fetching ...

Development of a defacing algorithm to protect the privacy of head and neck cancer patients in publicly-accessible radiotherapy datasets

Kayla O'Sullivan-Steben, Luc Galarneau, John Kildea

TL;DR

Public head and neck CT datasets pose reidentification risks due to facial features. The authors introduce a pixel-removal defacing algorithm that crops anterior to the eye center while preserving OARs and PTVs and extends protection to DICOM-RT data, validated on 829 CT-sim scans from 622 patients. Privacy tests using FaceNet512 show reidentification drops from $97\%$ to $4\%$ after defacing, while OAR auto-segmentation remains intact below the defaced region and PTVs are largely unaffected ($86.0\%$ fully below defaced region; $4.9\%$ overlapping). The approach enables secure sharing of HNC imaging datasets for Big Data and AI without sacrificing radiotherapy research utility, addressing a critical privacy gap in radiotherapy data sharing.

Abstract

Introduction: The rise in public medical imaging datasets has raised concerns about patient reidentification from head CT scans. However, existing defacing algorithms often remove or distort Organs at Risk (OARs) and Planning Target Volumes (PTVs) in head and neck cancer (HNC) patients, and ignore DICOM-RT Structure Set and Dose data. Therefore, we developed and validated a novel automated defacing algorithm that preserves these critical structures while removing identifiable features from HNC CTs and DICOM-RT data. Methods: Eye contours were used as landmarks to automate the removal of CT pixels above the inferior-most eye slice and anterior to the eye midpoint. Pixels within PTVs were retained if they intersected with the removed region. The body contour and dose map were reshaped to reflect the defaced image. We validated our approach on 829 HNC CTs from 622 patients. Privacy protection was evaluated by applying the FaceNet512 facial recognition algorithm before and after defacing on 3D-rendered CT pairs from 70 patients. Research utility was assessed by examining the impact of defacing on autocontouring performance using LimbusAI and analyzing PTV locations relative to the defaced regions. Results: Before defacing, FaceNet512 matched 97% of patients' CTs. After defacing, this rate dropped to 4%. LimbusAI effectively autocontoured organs in the defaced CTs, with perfect Dice scores of 1 for OARs below the defaced region, and excellent scores exceeding 0.95 for OARs on the same slices as the crop. We found that 86% of PTVs were entirely below the cropped region, 9.1% were on the same slice as the crop without overlap, and only 4.9% extended into the cropped area. Conclusions: We developed a novel defacing algorithm that anonymizes HNC CT scans and related DICOM-RT data while preserving essential structures, enabling the sharing of HNC imaging datasets for Big Data and AI.

Development of a defacing algorithm to protect the privacy of head and neck cancer patients in publicly-accessible radiotherapy datasets

TL;DR

Public head and neck CT datasets pose reidentification risks due to facial features. The authors introduce a pixel-removal defacing algorithm that crops anterior to the eye center while preserving OARs and PTVs and extends protection to DICOM-RT data, validated on 829 CT-sim scans from 622 patients. Privacy tests using FaceNet512 show reidentification drops from to after defacing, while OAR auto-segmentation remains intact below the defaced region and PTVs are largely unaffected ( fully below defaced region; overlapping). The approach enables secure sharing of HNC imaging datasets for Big Data and AI without sacrificing radiotherapy research utility, addressing a critical privacy gap in radiotherapy data sharing.

Abstract

Introduction: The rise in public medical imaging datasets has raised concerns about patient reidentification from head CT scans. However, existing defacing algorithms often remove or distort Organs at Risk (OARs) and Planning Target Volumes (PTVs) in head and neck cancer (HNC) patients, and ignore DICOM-RT Structure Set and Dose data. Therefore, we developed and validated a novel automated defacing algorithm that preserves these critical structures while removing identifiable features from HNC CTs and DICOM-RT data. Methods: Eye contours were used as landmarks to automate the removal of CT pixels above the inferior-most eye slice and anterior to the eye midpoint. Pixels within PTVs were retained if they intersected with the removed region. The body contour and dose map were reshaped to reflect the defaced image. We validated our approach on 829 HNC CTs from 622 patients. Privacy protection was evaluated by applying the FaceNet512 facial recognition algorithm before and after defacing on 3D-rendered CT pairs from 70 patients. Research utility was assessed by examining the impact of defacing on autocontouring performance using LimbusAI and analyzing PTV locations relative to the defaced regions. Results: Before defacing, FaceNet512 matched 97% of patients' CTs. After defacing, this rate dropped to 4%. LimbusAI effectively autocontoured organs in the defaced CTs, with perfect Dice scores of 1 for OARs below the defaced region, and excellent scores exceeding 0.95 for OARs on the same slices as the crop. We found that 86% of PTVs were entirely below the cropped region, 9.1% were on the same slice as the crop without overlap, and only 4.9% extended into the cropped area. Conclusions: We developed a novel defacing algorithm that anonymizes HNC CT scans and related DICOM-RT data while preserving essential structures, enabling the sharing of HNC imaging datasets for Big Data and AI.

Paper Structure

This paper contains 25 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: (A) Overview of the automated defacing algorithm’s workflow. (B) Example of the workflow applied to a patient whose PTV intrudes into the defaced region.
  • Figure 2: Overview of the three groups of facial recognition tests performed. Each pair of image comparisons yields one cosine distance. Note that these images are artist-rendered 3D faces for visualization purposes. They do not represent the CTs of real patients.
  • Figure 3: 3D surface rendering of the reconstructed face of a head phantom before and after defacing. The head phantom data were retrieved from the SlicerRtData GitHub repositorynoauthor_slicerrtdataeclipse-8120-phantom-ent_nodate
  • Figure 4: Results of the FaceNet512 facial recognition algorithm on 70 patients for our three pairing groups. Lower cosine distances indicate a higher likelihood that two scans are from the same patient. (A) presents histograms of the cosine distances of the three pairing groups tested. (B) shows the same data presented in whisker plots, with blue lines connecting data for the same patient.
  • Figure 5: Visualization of LimbusAI’s auto-contoured OARs before and after defacing on a sample patient. Not pictured are the brachial plexuses, clavicles, cochleas, hippocampi, left lung, submandibular glands, parotid glands, and the following right and left Lymph Node (LN) levels: Neck, Neck 2347AB, Neck IB, and Neck V.
  • ...and 1 more figures