Table of Contents
Fetching ...

Class Imbalance Correction for Improved Universal Lesion Detection and Tagging in CT

Peter D. Erickson, Tejas Sudharshan Mathai, Ronald M. Summers

TL;DR

The paper tackles the problem that the public DeepLesion CT dataset shows severe imbalance across body-part lesion labels, missing annotations, and tagging inconsistencies, which can hinder automatic universal lesion detection and tagging. Using a limited annotated subset $D_L$, the authors train a state-of-the-art detector VFNet and compare with Faster RCNN, RetinaNet, and FoveaBox, applying $WBF$ to combine predictions. They propose three data-balancing experiments $E_{BP}$, $E_{N}$, and $E_{S}$ (and a random unbalanced baseline $E_{U}$) to balance by body part, per-patient lesion count, and lesion size. Results show that balancing by body part labels increases recall for under-represented classes across models and balancing by lesion size boosts recall for VFNet across all classes, with an accompanying clinically useful structured reporting guideline for the radiology report.

Abstract

Radiologists routinely detect and size lesions in CT to stage cancer and assess tumor burden. To potentially aid their efforts, multiple lesion detection algorithms have been developed with a large public dataset called DeepLesion (32,735 lesions, 32,120 CT slices, 10,594 studies, 4,427 patients, 8 body part labels). However, this dataset contains missing measurements and lesion tags, and exhibits a severe imbalance in the number of lesions per label category. In this work, we utilize a limited subset of DeepLesion (6\%, 1331 lesions, 1309 slices) containing lesion annotations and body part label tags to train a VFNet model to detect lesions and tag them. We address the class imbalance by conducting three experiments: 1) Balancing data by the body part labels, 2) Balancing data by the number of lesions per patient, and 3) Balancing data by the lesion size. In contrast to a randomly sampled (unbalanced) data subset, our results indicated that balancing the body part labels always increased sensitivity for lesions >= 1cm for classes with low data quantities (Bone: 80\% vs. 46\%, Kidney: 77\% vs. 61\%, Soft Tissue: 70\% vs. 60\%, Pelvis: 83\% vs. 76\%). Similar trends were seen for three other models tested (FasterRCNN, RetinaNet, FoveaBox). Balancing data by lesion size also helped the VFNet model improve recalls for all classes in contrast to an unbalanced dataset. We also provide a structured reporting guideline for a ``Lesions'' subsection to be entered into the ``Findings'' section of a radiology report. To our knowledge, we are the first to report the class imbalance in DeepLesion, and have taken data-driven steps to address it in the context of joint lesion detection and tagging.

Class Imbalance Correction for Improved Universal Lesion Detection and Tagging in CT

TL;DR

The paper tackles the problem that the public DeepLesion CT dataset shows severe imbalance across body-part lesion labels, missing annotations, and tagging inconsistencies, which can hinder automatic universal lesion detection and tagging. Using a limited annotated subset , the authors train a state-of-the-art detector VFNet and compare with Faster RCNN, RetinaNet, and FoveaBox, applying to combine predictions. They propose three data-balancing experiments , , and (and a random unbalanced baseline ) to balance by body part, per-patient lesion count, and lesion size. Results show that balancing by body part labels increases recall for under-represented classes across models and balancing by lesion size boosts recall for VFNet across all classes, with an accompanying clinically useful structured reporting guideline for the radiology report.

Abstract

Radiologists routinely detect and size lesions in CT to stage cancer and assess tumor burden. To potentially aid their efforts, multiple lesion detection algorithms have been developed with a large public dataset called DeepLesion (32,735 lesions, 32,120 CT slices, 10,594 studies, 4,427 patients, 8 body part labels). However, this dataset contains missing measurements and lesion tags, and exhibits a severe imbalance in the number of lesions per label category. In this work, we utilize a limited subset of DeepLesion (6\%, 1331 lesions, 1309 slices) containing lesion annotations and body part label tags to train a VFNet model to detect lesions and tag them. We address the class imbalance by conducting three experiments: 1) Balancing data by the body part labels, 2) Balancing data by the number of lesions per patient, and 3) Balancing data by the lesion size. In contrast to a randomly sampled (unbalanced) data subset, our results indicated that balancing the body part labels always increased sensitivity for lesions >= 1cm for classes with low data quantities (Bone: 80\% vs. 46\%, Kidney: 77\% vs. 61\%, Soft Tissue: 70\% vs. 60\%, Pelvis: 83\% vs. 76\%). Similar trends were seen for three other models tested (FasterRCNN, RetinaNet, FoveaBox). Balancing data by lesion size also helped the VFNet model improve recalls for all classes in contrast to an unbalanced dataset. We also provide a structured reporting guideline for a ``Lesions'' subsection to be entered into the ``Findings'' section of a radiology report. To our knowledge, we are the first to report the class imbalance in DeepLesion, and have taken data-driven steps to address it in the context of joint lesion detection and tagging.

Paper Structure

This paper contains 5 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: (a) shows the lesion distribution per body part label in the DeepLesion dataset Yan2018_DeepLesion with certain over-represented and under-represented classes. (b) shows the number of patients with a specific number of lesions annotated. (c) Compared to the unbalanced dataset ${D}_{U}$, our dataset ${D}_{BP}$ balanced the number of lesions across the different body part classes (orange). (d) shows the lesion distribution for patients who were divided into two groups: G1 had patients with 1-2 lesions and G2 had patients with 3+ lesions. Compared to ${D}_{U}$, dataset ${D}_{N}$ (orange) had an equal number of lesions in G1 and G2. The number of patients in each group was not balanced. (e) shows the lesion distribution categorized by the short axis diameter (SAD) length. Compared to ${D}_{U}$, in dataset ${D}_{S}$ the number of lesions with SAD $\geq$ 1cm and SAD $<$ 1cm were balanced (orange). (f) Four lesions were detected in the chest area. Green boxes: GT, yellow boxes: TP, red boxes: FP. The top-3 predictions, their labels, and confidence scores were compiled into a structured "Lesions" sub-section for entry into the "Findings" section of a radiology report. Only lesions that were predicted with confidences $\geq$50% are shown. Figure is best viewed electronically in color.
  • Figure 2: Columns (a)-(d) show outputs of the various models on slices from CT volumes of two different patients. The first row of each pairing represents the model output after being trained on an unbalanced ${D}_{U}$ dataset, while the second row shows results when trained on a dataset balanced by body part labels ${D}_{BP}$. Green boxes: GT, yellow boxes: TP, red boxes: FP. The predicted classes and confidence scores are also shown. The first pair shows that models models trained with ${D}_{U}$ did not identify and classify a "Bone" lesion correctly (first row), whereas one trained on ${D}_{BP}$ did (second row). Particularly, VFNet trained on ${D}_{BP}$ predicted correctly with a confidence on 97%. The second pair shows fewer FP for VFNet with ${D}_{BP}$, and a missed detection for FoveaBox (last row).