Table of Contents
Fetching ...

Fine-tuning Vision Classifiers On A Budget

Sunil Kumar, Ted Sandler, Paulina Varshavskaya

TL;DR

This work ties the notion of label quality to confidence in labeler accuracy and shows that, when prior estimates of labeler accuracy are available, using a simple naive-Bayes model to estimate the true labels allows us to label more data on a fixed budget without compromising label or fine-tuning quality.

Abstract

Fine-tuning modern computer vision models requires accurately labeled data for which the ground truth may not exist, but a set of multiple labels can be obtained from labelers of variable accuracy. We tie the notion of label quality to confidence in labeler accuracy and show that, when prior estimates of labeler accuracy are available, using a simple naive-Bayes model to estimate the true labels allows us to label more data on a fixed budget without compromising label or fine-tuning quality. We present experiments on a dataset of industrial images that demonstrates that our method, called Ground Truth Extension (GTX), enables fine-tuning ML models using fewer human labels.

Fine-tuning Vision Classifiers On A Budget

TL;DR

This work ties the notion of label quality to confidence in labeler accuracy and shows that, when prior estimates of labeler accuracy are available, using a simple naive-Bayes model to estimate the true labels allows us to label more data on a fixed budget without compromising label or fine-tuning quality.

Abstract

Fine-tuning modern computer vision models requires accurately labeled data for which the ground truth may not exist, but a set of multiple labels can be obtained from labelers of variable accuracy. We tie the notion of label quality to confidence in labeler accuracy and show that, when prior estimates of labeler accuracy are available, using a simple naive-Bayes model to estimate the true labels allows us to label more data on a fixed budget without compromising label or fine-tuning quality. We present experiments on a dataset of industrial images that demonstrates that our method, called Ground Truth Extension (GTX), enables fine-tuning ML models using fewer human labels.
Paper Structure (26 sections, 4 equations, 5 figures, 2 tables)

This paper contains 26 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Efficiency of Label Aggregation Methods. The figure shows the error rate of various label aggregation methods as a function of the number of labels per example. GTX demonstrates superior efficiency by using fewer labels to achieve lower error rates. See Section 3.4 for more details.
  • Figure 2: Dynamics of the uncertainty sampling strategy. For each method, we report how error rate and MAE improve as we collect labels using uncertainty sampling over 10 trials. The dataset has 5000 examples. We plot data starting at 5000 labels collected, after each example is labeled once. The error bars report standard error. Results are presented for (a) more accurate labelers and (b) less accurate labelers. See section \ref{['subsec: us_results']} for details.
  • Figure 3: Example YES and NO frames from the singlehole-part-tapping dataset for the question: Is the wood block being drilled correctly? It must be touching the metal guide block on the left and the person must be holding it with his thumb and at least one additional finger.
  • Figure 4: Number of training set examples (leftmost column) and finetuning results as a function of labeling budget for compared with the three baselines. We report the balanced error rate (middle column) and expected calibration error (rightmost column) on the withheld evaluation set.
  • Figure 5: Boxplots of balanced error on the evaluation set after fine-tuning a pretrained EfficientNet-b0 on the singlehole-part-tapping image classification dataset: labels from the accurate labeler cohort (left) vs less accurate labeler cohort (right). Note different $y$ scales.