The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

Anvar Kurmukov; Valeria Chernina; Regina Gareeva; Maria Dugova; Ekaterina Petrash; Olga Aleshina; Maxim Pisov; Boris Shirokikh; Valentin Samokhin; Vladislav Proskurov; Stanislav Shimovolos; Maria Basova; Mikhail Goncahrov; Eugenia Soboleva; Maria Donskova; Farukh Yaushev; Alexey Shevtsov; Alexey Zakharov; Talgat Saparov; Victor Gombolevskiy; Mikhail Belyaev

The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

Anvar Kurmukov, Valeria Chernina, Regina Gareeva, Maria Dugova, Ekaterina Petrash, Olga Aleshina, Maxim Pisov, Boris Shirokikh, Valentin Samokhin, Vladislav Proskurov, Stanislav Shimovolos, Maria Basova, Mikhail Goncahrov, Eugenia Soboleva, Maria Donskova, Farukh Yaushev, Alexey Shevtsov, Alexey Zakharov, Talgat Saparov, Victor Gombolevskiy, Mikhail Belyaev

TL;DR

Radiologist workforce shortages and rising chest CT demand motivate evaluating a multi-pathology deep-learning aid (DLA). The authors conducted a cross-over reader study using EfficientReadCT to augment radiologists with DLA across 200 CT studies and 12 pathologies, comparing time and diagnostic performance across control, informed, and DLA-assisted arms. DLA use reduced mean interpretation time by $2.9$ minutes per study (≈$20.6\%$) and increased sensitivity by $28.4$ points, with specificity largely preserved ($-2.4$ points, p=$0.13$); gains were strongest for pathologies relying on morphological measurements. The within-r radiologist crossover design shows consistent time and sensitivity improvements across participants, supporting AI-assisted chest CT interpretation to improve efficiency and detection, potentially addressing radiologist workload challenges.

Abstract

Interpretation of chest computed tomography (CT) is time-consuming. Previous studies have measured the time-saving effect of using a deep-learning-based aid (DLA) for CT interpretation. We evaluated the joint impact of a multi-pathology DLA on the time and accuracy of radiologists' reading. 40 radiologists were randomly split into three experimental arms: control (10), who interpret studies without assistance; informed group (10), who were briefed about DLA pathologies, but performed readings without it; and the experimental group (20), who interpreted half studies with DLA, and half without. Every arm used the same 200 CT studies retrospectively collected from BIMCV-COVID19 dataset; each radiologist provided readings for 20 CT studies. We compared interpretation time, and accuracy of participants diagnostic report with respect to 12 pathological findings. Mean reading time per study was 15.6 minutes [SD 8.5] in the control arm, 13.2 minutes [SD 8.7] in the informed arm, 14.4 [SD 10.3] in the experimental arm without DLA, and 11.4 minutes [SD 7.8] in the experimental arm with DLA. Mean sensitivity and specificity were 41.5 [SD 30.4], 86.8 [SD 28.3] in the control arm; 53.5 [SD 22.7], 92.3 [SD 9.4] in the informed non-assisted arm; 63.2 [SD 16.4], 92.3 [SD 8.2] in the experimental arm without DLA; and 91.6 [SD 7.2], 89.9 [SD 6.0] in the experimental arm with DLA. DLA speed up interpretation time per study by 2.9 minutes (CI95 [1.7, 4.3], p<0.0005), increased sensitivity by 28.4 (CI95 [23.4, 33.4], p<0.0005), and decreased specificity by 2.4 (CI95 [0.6, 4.3], p=0.13). Of 20 radiologists in the experimental arm, 16 have improved reading time and sensitivity, two improved their time with a marginal drop in sensitivity, and two participants improved sensitivity with increased time. Overall, DLA introduction decreased reading time by 20.6%.

The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

TL;DR

minutes per study (≈

) and increased sensitivity by

points, with specificity largely preserved (

points, p=

); gains were strongest for pathologies relying on morphological measurements. The within-r radiologist crossover design shows consistent time and sensitivity improvements across participants, supporting AI-assisted chest CT interpretation to improve efficiency and detection, potentially addressing radiologist workload challenges.

Abstract

Paper Structure (25 sections, 7 figures, 6 tables)

This paper contains 25 sections, 7 figures, 6 tables.

Introduction
Methods
Software
Study design
Groups without DLA
The experimental group
Validation data
Ground truth development
Statistical analysis
Results
BIMCV vs Control vs Informed
Experimental group
Inter-reader variability
Time and performance
Findings severity
...and 10 more sections

Figures (7)

Figure 1: Study design.
Figure 2: Time vs F1 score in experimental group by participants. Each line represents one participant from the experimental arm.
Figure 3: Sensitivity and specificity by pathologies between participants with DLA and without in the experimental arm.
Figure 4: Original CT image and four CT windows provided by DLA: Fusion, Abdomen (SOFT), Lung, Bone. Best viewed in color.
Figure 5: Summary images provided as first images in processed CT series for each finding: aorta, lungs, mediastinal lymph node, pulmonary trunk, ribs fractures, vertebrae with Genant index measurements. Best viewed in color.
...and 2 more figures

The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

TL;DR

Abstract

The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

Authors

TL;DR

Abstract

Table of Contents

Figures (7)