DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data

Angel Paul; Dhivin Shaji; Lifeng Han; Warren Del-Pinto; Goran Nenadic

DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data

Angel Paul, Dhivin Shaji, Lifeng Han, Warren Del-Pinto, Goran Nenadic

TL;DR

This work enhances the MASK framework by integrating ClinicalBERT, a deep learning model specifically fine-tuned on clinical texts, alongside traditional de-identification methods like dictionary lookup and rule-based approaches.

Abstract

De-identification is important in protecting patients' privacy for healthcare text analytics. The MASK framework is one of the best on the de-identification shared task organised by n2c2/i2b2 challenges. This work enhances the MASK framework by integrating ClinicalBERT, a deep learning model specifically fine-tuned on clinical texts, alongside traditional de-identification methods like dictionary lookup and rule-based approaches. The system effectively identifies and either redacts or replaces sensitive identifiable entities within clinical documents, while also allowing users to customise the masked documents according to their specific needs. The integration of ClinicalBERT significantly improves the performance of entity recognition, achieving 0.9732 F1-score, especially for common entities such as names, dates, and locations. A risk assessment feature has also been developed, which analyses the uniqueness of context within documents to classify them into risk levels, guiding further de-identification efforts. While the system demonstrates strong overall performance, this work highlights areas for future improvement, including handling more complex entity occurrences and enhancing the system's adaptability to different clinical settings.

DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data

TL;DR

Abstract

Paper Structure (48 sections, 1 equation, 21 figures, 8 tables)

This paper contains 48 sections, 1 equation, 21 figures, 8 tables.

Introduction
Importance of Protecting Patient Information
Background on Clinical Text De-identification
Problem Statement
Objectives
Ethics
Significance and Highlights
Background and Literature
Literature on PHI Identification
Risk assessment in De-identification
Overview of the MASK Framework
DeIDClinic Design and Implementation Strategy
Aims and Objectives
Scope of the Investigation
System Architecture and Implementation
...and 33 more sections

Figures (21)

Figure 1: Example of original and redacted text
Figure 2: DeIDClinic Base Framwork
Figure 3: DeIDClinic Framework Details
Figure 4: Sample Output from ClinicalBERT Model after Post-processing in the format- (Entity, Start, End, Prediction Tag)
Figure 5: Entity occurrence distribution in the i2b2 dataset
...and 16 more figures

DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data

TL;DR

Abstract

DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data

Authors

TL;DR

Abstract

Table of Contents

Figures (21)