Knee or ROC
Veronica Wendt, Byunggu Yu, Caleb Kelly, Junwhan Kim
TL;DR
This work tackles multi-class image classification when population representations are unknown by combining self-attention transformers with ad-hoc knee-based thresholding. It introduces three methods to derive thresholds and knee points, leveraging a Compact Convolutional Transformer (CCT) to maintain efficiency on small datasets. Experimental results on CIFAR-10 illustrate that Method 1 provides robust ROC curves for small multi-class cases, while Method 3 yields stable knee-based thresholds for larger multi-class scenarios; Method 2 offers an alternative but may underperform in practice. The approach enables threshold adaptation in live environments, with future work extending to larger datasets and integrating knee-based decisions into detection frameworks like YOLO and Faster R-CNN.
Abstract
Self-attention transformers have demonstrated accuracy for image classification with smaller data sets. However, a limitation is that tests to-date are based upon single class image detection with known representation of image populations. For instances where the input image classes may be greater than one and test sets that lack full information on representation of image populations, accuracy calculations must adapt. The Receiver Operating Characteristic (ROC) accuracy thresh-old can address the instances of multi-class input images. However, this approach is unsuitable in instances where image population representation is unknown. We consider calculating accuracy using the knee method to determine threshold values on an ad-hoc basis. Results of ROC curve and knee thresholds for a multi-class data set, created from CIFAR-10 images, are discussed for multi-class image detection.
