Table of Contents
Fetching ...

Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement

Bryan Bo Cao, Lawrence O'Gorman, Michael Coss, Shubham Jain

TL;DR

The proposed Few-Class Arena (FCA) offers a new tool for efficient machine learning in the Few-Class Regime, with goals ranging from a new efficient class similarity proposal, to lightweight model architecture design, to a new scaling law.

Abstract

We propose Few-Class Arena (FCA), as a unified benchmark with focus on testing efficient image classification models for few classes. A wide variety of benchmark datasets with many classes (80-1000) have been created to assist Computer Vision architectural evolution. An increasing number of vision models are evaluated with these many-class datasets. However, real-world applications often involve substantially fewer classes of interest (2-10). This gap between many and few classes makes it difficult to predict performance of the few-class applications using models trained on the available many-class datasets. To date, little has been offered to evaluate models in this Few-Class Regime. We conduct a systematic evaluation of the ResNet family trained on ImageNet subsets from 2 to 1000 classes, and test a wide spectrum of Convolutional Neural Networks and Transformer architectures over ten datasets by using our newly proposed FCA tool. Furthermore, to aid an up-front assessment of dataset difficulty and a more efficient selection of models, we incorporate a difficulty measure as a function of class similarity. FCA offers a new tool for efficient machine learning in the Few-Class Regime, with goals ranging from a new efficient class similarity proposal, to lightweight model architecture design, to a new scaling law. FCA is user-friendly and can be easily extended to new models and datasets, facilitating future research work. Our benchmark is available at https://github.com/bryanbocao/fca.

Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement

TL;DR

The proposed Few-Class Arena (FCA) offers a new tool for efficient machine learning in the Few-Class Regime, with goals ranging from a new efficient class similarity proposal, to lightweight model architecture design, to a new scaling law.

Abstract

We propose Few-Class Arena (FCA), as a unified benchmark with focus on testing efficient image classification models for few classes. A wide variety of benchmark datasets with many classes (80-1000) have been created to assist Computer Vision architectural evolution. An increasing number of vision models are evaluated with these many-class datasets. However, real-world applications often involve substantially fewer classes of interest (2-10). This gap between many and few classes makes it difficult to predict performance of the few-class applications using models trained on the available many-class datasets. To date, little has been offered to evaluate models in this Few-Class Regime. We conduct a systematic evaluation of the ResNet family trained on ImageNet subsets from 2 to 1000 classes, and test a wide spectrum of Convolutional Neural Networks and Transformer architectures over ten datasets by using our newly proposed FCA tool. Furthermore, to aid an up-front assessment of dataset difficulty and a more efficient selection of models, we incorporate a difficulty measure as a function of class similarity. FCA offers a new tool for efficient machine learning in the Few-Class Regime, with goals ranging from a new efficient class similarity proposal, to lightweight model architecture design, to a new scaling law. FCA is user-friendly and can be easily extended to new models and datasets, facilitating future research work. Our benchmark is available at https://github.com/bryanbocao/fca.

Paper Structure

This paper contains 28 sections, 14 equations, 19 figures, 8 tables.

Figures (19)

  • Figure 1: Top-1 accuracies of various scales of ResNet, whose model sizes are shown in the legend, and whose plots vary from dark to light by decreasing size. Plots range along number of classes $N_{CL}$ from the full ImageNet size (1000) down to the Few-Class Regime. Each model is tested on 5 subsets whose $N_{CL}$ classes are randomly sampled from the original 1000 classes. (a) Plots for sub-models trained on subsets of classes (blue) and full models trained on all 1000 classes (red). (b) Zoomed window shows the standard deviation of subset's accuracies is much smaller than for the full model. (c.1) Full model accuracies drop when $N_{CL}$ decreases. (c.2) Full model accuracies increase as model scales up in the Few-Class Regime. (d.1) Sub-model accuracies grow as $N_{CL}$ decreases. (d.2) Sub-model accuracies do not increase when model scales up in the Few-Class Regime.
  • Figure 2: Many-Class Full Dataset Benchmark.
  • Figure 3: DCN-Full by Top-1 Accuracy ($\%$). $N_{CL}$ ranges from many to $2$.
  • Figure 4: DCN-Sub (blue) and DCN-Full (red) by Top-1 Accuracy ($\%$). $N_{CL}$ ranges from $2$ to $4$.
  • Figure 5: Pearson correlation coefficient ($r$) between DCN and SimSS when $N_{CL} \in \{2, 3, 4, 5, 10, 100\}$. DCN-Sub (blue squares) is more highly correlated than DCN-Full (red diamonds) with SimSS using both similarity base functions of CLIP (dashed line) and DINOv2 (solid line) with $r \geq 0.88$.
  • ...and 14 more figures