No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

Christopher Klugmann; Rafid Mahmood; Guruprasad Hegde; Amit Kale; Daniel Kondermann

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

Christopher Klugmann, Rafid Mahmood, Guruprasad Hegde, Amit Kale, Daniel Kondermann

TL;DR

This paper presents a framework that enables quality checking of visual data at large scales without sacrificing the reliability of the results, and shows that the posterior distributions over soft labels predicted by the model can be used as priors in further inference processes, reducing the need for numerous human labelers to approximate true soft labels accurately.

Abstract

Labeling visual data is expensive and time-consuming. Crowdsourcing systems promise to enable highly parallelizable annotations through the participation of monetarily or otherwise motivated workers, but even this approach has its limits. The solution: replace manual work with machine work. But how reliable are machine annotators? Sacrificing data quality for high throughput cannot be acceptable, especially in safety-critical applications such as autonomous driving. In this paper, we present a framework that enables quality checking of visual data at large scales without sacrificing the reliability of the results. We ask annotators simple questions with discrete answers, which can be highly automated using a convolutional neural network trained to predict crowd responses. Unlike the methods of previous work, which aim to directly predict soft labels to address human uncertainty, we use per-task posterior distributions over soft labels as our training objective, leveraging a Dirichlet prior for analytical accessibility. We demonstrate our approach on two challenging real-world automotive datasets, showing that our model can fully automate a significant portion of tasks, saving costs in the high double-digit percentage range. Our model reliably predicts human uncertainty, allowing for more accurate inspection and filtering of difficult examples. Additionally, we show that the posterior distributions over soft labels predicted by our model can be used as priors in further inference processes, reducing the need for numerous human labelers to approximate true soft labels accurately. This results in further cost reductions and more efficient use of human resources in the annotation process.

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

TL;DR

Abstract

Paper Structure (32 sections, 13 equations, 12 figures, 5 tables)

This paper contains 32 sections, 13 equations, 12 figures, 5 tables.

Introduction
Related Work
Truth Inference.
Predictive Crowd Models.
Soft Labels.
A Simple Model of Annotation
Dirichlet-Multinomial Model
Point Estimation.
Why?
Making Sense of Probability Vectors
Ambiguity.
Confidence.
Distance.
Predicting Posterior Distributions
The Data Process.
...and 17 more sections

Figures (12)

Figure 1: A crowd of labelers tackles categorical annotation tasks by providing sets of discrete answers per object. Bayesian inference is then applied to determine posterior distributions over task parameters. We train a machine to predict these actual posteriors, which serve as priors for new tasks, reducing the need for extensive human input in reporting final task responses.
Figure 2: Overview of the process for generating new annotated data from a set of given unannotated image data. A model pre-trained on a small set of data sees unlabeled images for which it predicts the crowd response. If the confidence of the model prediction exceeds a certain threshold, the prediction can be used as ground truth and no further annotation is needed. Otherwise, the prediction is used as a prior distribution for further Bayesian updates.
Figure 3: Distribution of majority vote within each attribute of the Mapillary dataset, stratified by data split. The numbers above the bars indicate the absolute frequencies with which the majority answers were selected for the respective annotation questions. For most of the datasets considered, the distribution of responses is highly non-uniform, i.e. majority responses typically fall into a few categories, while others are strongly underrepresented.
Figure 4: Example predictions of the Dirichlet model for all six attributes of one object of the Mapillary test dataset (top left). As a visualization of the predicted Dirichlet distributions, we show soft bar charts that represent repeatedly sampled realizations of possible response vectors.
Figure 5: Automation-correctness curve for the ECP/human-being test dataset. Shown is a comparison of the hard label model (blue) and the proposed Dirichlet model (red). The shaded areas indicate approximate $95\%$ confidence intervals calculated from the sample quantiles over $B=1024$ bootstrap samples. The curve is used for out of sample predictions of how correct the auto-annotation model is when a certain level of automation is selected. Each point on the curve corresponds to a threshold parameter on the model confidence. The choice of threshold determines how many of the automatically annotated instances we trust and which part of the data should be evaluated by human annotators.
...and 7 more figures

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

TL;DR

Abstract

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

Authors

TL;DR

Abstract

Table of Contents

Figures (12)