Slice-wise quality assessment of high b-value breast DWI via deep learning-based artifact detection

Ameya Markale; Luise Brock; Ihor Horishnyi; Dominika Skwierawska; Tri-Thien Nguyen; Hannes Schreiter; Shirin Heidarikahkesh; Lorenz A. Kapsner; Michael Uder; Sabine Ohlmeyer; Frederik B Laun; Andrzej Liebert; Sebastian Bickelhaupt

Slice-wise quality assessment of high b-value breast DWI via deep learning-based artifact detection

Ameya Markale, Luise Brock, Ihor Horishnyi, Dominika Skwierawska, Tri-Thien Nguyen, Hannes Schreiter, Shirin Heidarikahkesh, Lorenz A. Kapsner, Michael Uder, Sabine Ohlmeyer, Frederik B Laun, Andrzej Liebert, Sebastian Bickelhaupt

TL;DR

Hyper- and hypointense artifact detection in slice-wise breast DWI MRI dataset using CNNs particularly DenseNet121, seems promising and requires further validation.

Abstract

Diffusion-weighted imaging (DWI) can support lesion detection and characterization in breast magnetic resonance imaging (MRI), however especially high b-value diffusion-weighted acquisitions can be prone to intensity artifacts that can affect diagnostic image assessment. This study aims to detect both hyper- and hypointense artifacts on high b-value diffusion-weighted images (b=1500 s/mm2) using deep learning, employing either a binary classification (artifact presence) or a multiclass classification (artifact intensity) approach on a slice-wise dataset.This IRB-approved retrospective study used the single-center dataset comprising n=11806 slices from routine 3T breast MRI examinations performed between 2022 and mid-2023. Three convolutional neural network (CNN) architectures (DenseNet121, ResNet18, and SEResNet50) were trained for binary classification of hyper- and hypointense artifacts. The best performing model (DenseNet121) was applied to an independent holdout test set and was further trained separately for multiclass classification. Evaluation included area under receiver operating characteristic curve (AUROC), area under precision recall curve (AUPRC), precision, and recall, as well as analysis of predicted bounding box positions, derived from the network Grad-CAM heatmaps. DenseNet121 achieved AUROCs of 0.92 and 0.94 for hyper- and hypointense artifact detection, respectively, and weighted AUROCs of 0.85 and 0.88 for multiclass classification on single-slice high b-value diffusion-weighted images. A radiologist evaluated bounding box precision on a 1-5 Likert-like scale across 200 slices, achieving mean scores of 3.33+-1.04 for hyperintense artifacts and 2.62+-0.81 for hypointense artifacts. Hyper- and hypointense artifact detection in slice-wise breast DWI MRI dataset (b=1500 s/mm2) using CNNs particularly DenseNet121, seems promising and requires further validation.

Slice-wise quality assessment of high b-value breast DWI via deep learning-based artifact detection

TL;DR

Hyper- and hypointense artifact detection in slice-wise breast DWI MRI dataset using CNNs particularly DenseNet121, seems promising and requires further validation.

Abstract

Paper Structure (17 sections, 7 figures, 6 tables)

This paper contains 17 sections, 7 figures, 6 tables.

Introduction
Materials and Methods
Datasets
Experimental Setup Overview
Pre-selection and Slice-wise Dataset Generation
Data Labeling
Data Splitting
Data Preparation: Masking, Preprocessing, and Augmentation
Deep Learning: Training and Setup
Visualization of Network Prediction
Performance Evaluation
Results
Reading Evaluation
Hyperintense Artifact
Hypointense Artifact
...and 2 more sections

Figures (7)

Figure 1: Schematic overview of the study workflow, including maximum intensity projection (MIP)-based case pre-selection, slice-wise data generation to form ground truth (GT), data splitting, and model selection using DenseNet121, ResNet18, and SEResNet50 architectures on the validation set for binary classification of hyper- and hypointense artifacts. The best performing model was used for inference on the test set for binary classification and subsequently retrained for multiclass classification of hyper- and hypointense artifacts.
Figure 2: Slices affected by hyper- and hypointense artifacts on DWI (b = 1500 s/mm$^2$): a) the enclosed region shows hyperintense artifact caused by surface coil flare; b) the enclosed region shows hyperintense artifact that is likely caused by skin folding; (c-d) enclosed regions depict hypointense artifacts.
Figure 3: True Positive (TP), False Positive (FP), and False Negative (FN) cases of bounding boxes predicted on masked high b-value (b = 1500 s/mm$^2$) DWI slices for binary hyper- and hypointense artifact classification on the holdout test set using DenseNet121. In FN cases, the red arrow shows the correct region of artifact.
Figure 4: Class-wise confusion matrices and area under receiver operating characteristics curves (AUROCs) illustrating the performance of the DenseNet121 model for multiclass classification of hyper- and hypointense artifacts on their respective holdout test sets. a) Confusion matrix and AUROC for hyperintense multiclass classification. b) Confusion matrix and AUROC for hypointense multiclass classification.
Figure 5: The heatmap shows the agreement between ground truth (GT), Validator 1 (V1), and Validator 2 (V2) and the classes predicted by DenseNet121 on the holdout test set using Cohen's kappa score for hyper- and hypointense artifact obtained from multiclass classification.
...and 2 more figures

Slice-wise quality assessment of high b-value breast DWI via deep learning-based artifact detection

TL;DR

Abstract

Slice-wise quality assessment of high b-value breast DWI via deep learning-based artifact detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)