3D Universal Lesion Detection and Tagging in CT with Self-Training
Jared Frazier, Tejas Sudharshan Mathai, Jianfei Liu, Angshuman Paul, Ronald M. Summers
TL;DR
This work presents a self-training pipeline for 3D universal lesion detection and tagging (ULDT) in CT, starting from a 2D VFNet model trained on only 30% of the DeepLesion data and expanding detections into 3D through IoU-based aggregation and weighted fusion. By iterating intra- and inter-patient mining across four rounds and employing static and variable confidence thresholds plus class-balancing upsampling, the method achieves a mean sensitivity of $46.9\%$ at $[0.125:8]$ FP/vol, rivaling prior approaches that used the full DeepLesion dataset. The approach yields 3D lesion extents and body-part tags, enabling more accurate tumor burden assessment, with results comparable to state-of-the-art methods while relying on substantially less labeled data. This has practical implications for scalable ULDT in clinical CT workflows and lays groundwork for further improvements in 3D lesion understanding and tagging.
Abstract
Radiologists routinely perform the tedious task of lesion localization, classification, and size measurement in computed tomography (CT) studies. Universal lesion detection and tagging (ULDT) can simultaneously help alleviate the cumbersome nature of lesion measurement and enable tumor burden assessment. Previous ULDT approaches utilize the publicly available DeepLesion dataset, however it does not provide the full volumetric (3D) extent of lesions and also displays a severe class imbalance. In this work, we propose a self-training pipeline to detect 3D lesions and tag them according to the body part they occur in. We used a significantly limited 30\% subset of DeepLesion to train a VFNet model for 2D lesion detection and tagging. Next, the 2D lesion context was expanded into 3D, and the mined 3D lesion proposals were integrated back into the baseline training data in order to retrain the model over multiple rounds. Through the self-training procedure, our VFNet model learned from its own predictions, detected lesions in 3D, and tagged them. Our results indicated that our VFNet model achieved an average sensitivity of 46.9\% at [0.125:8] false positives (FP) with a limited 30\% data subset in comparison to the 46.8\% of an existing approach that used the entire DeepLesion dataset. To our knowledge, we are the first to jointly detect lesions in 3D and tag them according to the body part label.
