Correcting Class Imbalances with Self-Training for Improved Universal Lesion Detection and Tagging
Alexander Shieh, Tejas Sudharshan Mathai, Jianfei Liu, Angshuman Paul, Ronald M. Summers
TL;DR
The paper addresses the challenge of incomplete annotations and class imbalance in universal lesion detection and tagging (ULDT) on CT. It introduces a self-training pipeline using VFNet trained on a small annotated DeepLesion subset to mine additional lesions from a larger unlabeled set, followed by iterative retraining across four rounds. A key contribution is balancing the mined data via upsampling, which, when combined with a variable-threshold mining policy, yields substantial gains: a 6.5 percentage-point increase in mean sensitivity at 4FP over a non-upsampled baseline (78.5% vs 72.0%), and an 11.7-point gain over the same policy without upsampling (78.5% vs 66.8%). The approach improves or maintains per-class sensitivity across all eight lesion classes, demonstrating the potential to exploit unannotated data for ULDT in clinical workflows, while highlighting areas for further refinement like class-aware thresholding and alternative semi-supervised methods.
Abstract
Universal lesion detection and tagging (ULDT) in CT studies is critical for tumor burden assessment and tracking the progression of lesion status (growth/shrinkage) over time. However, a lack of fully annotated data hinders the development of effective ULDT approaches. Prior work used the DeepLesion dataset (4,427 patients, 10,594 studies, 32,120 CT slices, 32,735 lesions, 8 body part labels) for algorithmic development, but this dataset is not completely annotated and contains class imbalances. To address these issues, in this work, we developed a self-training pipeline for ULDT. A VFNet model was trained on a limited 11.5\% subset of DeepLesion (bounding boxes + tags) to detect and classify lesions in CT studies. Then, it identified and incorporated novel lesion candidates from a larger unseen data subset into its training set, and self-trained itself over multiple rounds. Multiple self-training experiments were conducted with different threshold policies to select predicted lesions with higher quality and cover the class imbalances. We discovered that direct self-training improved the sensitivities of over-represented lesion classes at the expense of under-represented classes. However, upsampling the lesions mined during self-training along with a variable threshold policy yielded a 6.5\% increase in sensitivity at 4 FP in contrast to self-training without class balancing (72\% vs 78.5\%) and a 11.7\% increase compared to the same self-training policy without upsampling (66.8\% vs 78.5\%). Furthermore, we show that our results either improved or maintained the sensitivity at 4FP for all 8 lesion classes.
