Table of Contents
Fetching ...

3D Universal Lesion Detection and Tagging in CT with Self-Training

Jared Frazier, Tejas Sudharshan Mathai, Jianfei Liu, Angshuman Paul, Ronald M. Summers

TL;DR

This work presents a self-training pipeline for 3D universal lesion detection and tagging (ULDT) in CT, starting from a 2D VFNet model trained on only 30% of the DeepLesion data and expanding detections into 3D through IoU-based aggregation and weighted fusion. By iterating intra- and inter-patient mining across four rounds and employing static and variable confidence thresholds plus class-balancing upsampling, the method achieves a mean sensitivity of $46.9\%$ at $[0.125:8]$ FP/vol, rivaling prior approaches that used the full DeepLesion dataset. The approach yields 3D lesion extents and body-part tags, enabling more accurate tumor burden assessment, with results comparable to state-of-the-art methods while relying on substantially less labeled data. This has practical implications for scalable ULDT in clinical CT workflows and lays groundwork for further improvements in 3D lesion understanding and tagging.

Abstract

Radiologists routinely perform the tedious task of lesion localization, classification, and size measurement in computed tomography (CT) studies. Universal lesion detection and tagging (ULDT) can simultaneously help alleviate the cumbersome nature of lesion measurement and enable tumor burden assessment. Previous ULDT approaches utilize the publicly available DeepLesion dataset, however it does not provide the full volumetric (3D) extent of lesions and also displays a severe class imbalance. In this work, we propose a self-training pipeline to detect 3D lesions and tag them according to the body part they occur in. We used a significantly limited 30\% subset of DeepLesion to train a VFNet model for 2D lesion detection and tagging. Next, the 2D lesion context was expanded into 3D, and the mined 3D lesion proposals were integrated back into the baseline training data in order to retrain the model over multiple rounds. Through the self-training procedure, our VFNet model learned from its own predictions, detected lesions in 3D, and tagged them. Our results indicated that our VFNet model achieved an average sensitivity of 46.9\% at [0.125:8] false positives (FP) with a limited 30\% data subset in comparison to the 46.8\% of an existing approach that used the entire DeepLesion dataset. To our knowledge, we are the first to jointly detect lesions in 3D and tag them according to the body part label.

3D Universal Lesion Detection and Tagging in CT with Self-Training

TL;DR

This work presents a self-training pipeline for 3D universal lesion detection and tagging (ULDT) in CT, starting from a 2D VFNet model trained on only 30% of the DeepLesion data and expanding detections into 3D through IoU-based aggregation and weighted fusion. By iterating intra- and inter-patient mining across four rounds and employing static and variable confidence thresholds plus class-balancing upsampling, the method achieves a mean sensitivity of at FP/vol, rivaling prior approaches that used the full DeepLesion dataset. The approach yields 3D lesion extents and body-part tags, enabling more accurate tumor burden assessment, with results comparable to state-of-the-art methods while relying on substantially less labeled data. This has practical implications for scalable ULDT in clinical CT workflows and lays groundwork for further improvements in 3D lesion understanding and tagging.

Abstract

Radiologists routinely perform the tedious task of lesion localization, classification, and size measurement in computed tomography (CT) studies. Universal lesion detection and tagging (ULDT) can simultaneously help alleviate the cumbersome nature of lesion measurement and enable tumor burden assessment. Previous ULDT approaches utilize the publicly available DeepLesion dataset, however it does not provide the full volumetric (3D) extent of lesions and also displays a severe class imbalance. In this work, we propose a self-training pipeline to detect 3D lesions and tag them according to the body part they occur in. We used a significantly limited 30\% subset of DeepLesion to train a VFNet model for 2D lesion detection and tagging. Next, the 2D lesion context was expanded into 3D, and the mined 3D lesion proposals were integrated back into the baseline training data in order to retrain the model over multiple rounds. Through the self-training procedure, our VFNet model learned from its own predictions, detected lesions in 3D, and tagged them. Our results indicated that our VFNet model achieved an average sensitivity of 46.9\% at [0.125:8] false positives (FP) with a limited 30\% data subset in comparison to the 46.8\% of an existing approach that used the entire DeepLesion dataset. To our knowledge, we are the first to jointly detect lesions in 3D and tag them according to the body part label.

Paper Structure

This paper contains 6 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) Distribution of lesion labels in the DeepLesion validation + test splits showing class imbalances. (b) Proposed self-training pipeline. A VFNet model was trained on 2D lesions annotations from the baseline training split, and it iteratively mined new 2D predictions on the training data and on an unseen data split (original DeepLesion training split). Following this step, the 2D predictions were consolidated based on IoU overlap and expanded into 3D. Next, the 3D predictions were incorporated into the baseline training split for re-training the model over multiple rounds. (c) - (e) 3D predictions with confidence score $\geq$ 50% shown for different slices of an abdominal CT volume. They were generated by the ensemble of models from different mining rounds self-trained with the variable threshold policy. Green: Ground Truth, Yellow: True Positive (IoU Yan2021_LENS$\geq$ 30%), and Red: False Positive. Lesion 1 is a GT "abdomen" lesion spanning across slices $[58, 73]$, while the predicted 3D proposal spans slices $[53, 73]$. Despite GT boxes being absent for lesion 1 in slices 53 through 57, the 3D proposal has an IoU overlap Yan2021_LENS$\geq$30% with the GT and is considered a TP. Notice that lesion 1 is visible across all displayed slices. Lesion 2 is a GT "kidney" lesion spanning slices $[52, 57]$ with the predicted 3D lesion spanning slices $[53, 54]$. Lesion 3 is a predicted 3D "abdomen" lesion spanning slices $[59, 63]$, and it was counted as a FP during lesion detection + tagging. This lesion was not annotated in the official DeepLesion test split, and thus the tag from this split was not mapped to the fully annotated LENS test split (see Sec. \ref{['sec:methods']} for details). Therefore, it is not a FP with respect to detection only. Also, the correct tag for lesion 3 is a "kidney" cyst, but these lesion types are often confused with the "liver" and "abdomen" classes. It is not visible in slice 64 due to the predicted extent.
  • Figure 2: Normalized Confusion Matrix: No Self-Training (Baseline).
  • Figure 3: Normalized Confusion Matrix: Ensemble of Static Threshold Models.
  • Figure 4: Normalized Confusion Matrix: Ensemble of Variable Threshold Models.