Table of Contents
Fetching ...

A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics

Bryan Bo Cao, Abhinav Sharma, Lawrence O'Gorman, Michael Coss, Shubham Jain

TL;DR

An efficient cosine similarity-based classification difficulty measure S that is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset is proposed that can be used to help select an efficient model 6 to 29x faster than through repeated training and testing.

Abstract

Although accuracy and computation benchmarks are widely available to help choose among neural network models, these are usually trained on datasets with many classes, and do not give a good idea of performance for few (< 10) classes. The conventional procedure to predict performance involves repeated training and testing on the different models and dataset variations. We propose an efficient cosine similarity-based classification difficulty measure S that is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset. After a single stage of training and testing per model family, relative performance for different datasets and models of the same family can be predicted by comparing difficulty measures - without further training and testing. Our proposed method is verified by extensive experiments on 8 CNN and ViT models and 7 datasets. Results show that S is highly correlated to model accuracy with correlation coefficient |r| = 0.796, outperforming the baseline Euclidean distance at |r| = 0.66. We show how a practitioner can use this measure to help select an efficient model 6 to 29x faster than through repeated training and testing. We also describe using the measure for an industrial application in which options are identified to select a model 42% smaller than the baseline YOLOv5-nano model, and if class merging from 3 to 2 classes meets requirements, 85% smaller.

A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics

TL;DR

An efficient cosine similarity-based classification difficulty measure S that is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset is proposed that can be used to help select an efficient model 6 to 29x faster than through repeated training and testing.

Abstract

Although accuracy and computation benchmarks are widely available to help choose among neural network models, these are usually trained on datasets with many classes, and do not give a good idea of performance for few (< 10) classes. The conventional procedure to predict performance involves repeated training and testing on the different models and dataset variations. We propose an efficient cosine similarity-based classification difficulty measure S that is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset. After a single stage of training and testing per model family, relative performance for different datasets and models of the same family can be predicted by comparing difficulty measures - without further training and testing. Our proposed method is verified by extensive experiments on 8 CNN and ViT models and 7 datasets. Results show that S is highly correlated to model accuracy with correlation coefficient |r| = 0.796, outperforming the baseline Euclidean distance at |r| = 0.66. We show how a practitioner can use this measure to help select an efficient model 6 to 29x faster than through repeated training and testing. We also describe using the measure for an industrial application in which options are identified to select a model 42% smaller than the baseline YOLOv5-nano model, and if class merging from 3 to 2 classes meets requirements, 85% smaller.
Paper Structure (13 sections, 6 equations, 3 figures, 5 tables)

This paper contains 13 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overall relationship of performance versus the number of classes $N_{CL}$. Each dot denotes an average of 3 subsets for each $N_{CL}$, while error bars represent standard deviations (each multiplied by 5 in (b) for visibility). (a) Image classification accuracy decreases for the classifiers tested when the number of CIFAR-10 classes is increased from 2 to 10. (b) Object detection accuracy and recall (R) decrease when the number of COCO classes is increased from 1 to 80. (c) Accuracy plot for increasingly smaller models from YOLOv5-nano through eight sub-YOLO models (SY1-8) and class groupings of 1 ($N_{CL}$:1), 10 ($N_{CL}$:10), and 80 ($N_{CL}$:80).
  • Figure 2: Overall trend of 5 efficient CNNs and 2 ViTs on 4 datasets: model performance tends to decrease while $N_{CL}$ increases. Each dot denotes the average accuracy of 5 subsets for $N_{CL}$. The error bars represent standard deviations of accuracy in 5 subsets. Accuracy: Top-1 Accuracy. RN: ResNet. MNv2: Mobilenet V2. EN: EfficientNet. MV: MobileViT. SW: Swin Transformer. CT256: CalTech256, CF100: CIFAR100, FD101: Food101, IN1K: ImageNet1K.
  • Figure 3: Matrices showing relationships among pairs of classes: (a) binary classification accuracy matrix using EfficientNet-B0, (b) binary-class similarity matrix with $S_E$ metric, (c) nonlinear relationship between binary classification accuracy (a) and similarity scores (b). The polynomial function with a degree of $d=3$ (blue) has least mse compared with $d=2$ (green) and $d=1$ (red). Sim.: Similarity. Acc.: Accuracy. Rel.: Relationship.