Table of Contents
Fetching ...

Learning from Stochastic Labels

Meng Wei, Zhongnian Li, Yong Zhou, Qiaoyu Guo, Xinzheng Xu

TL;DR

The paper introduces stochastic labels as a cost-saving labeling scheme for multiclass classification, where a ground-truth label is sought from a small random subset or a None annotation is provided. It develops an unbiased risk estimator that splits risk across observed and None-labeled cases, and establishes a formal estimation-error bound demonstrating convergence with increasing data. The approach is validated on multiple benchmarks (MNIST, Fashion, Kuzushiji, CIFAR-10), showing competitive or superior performance relative to weakly supervised and ordinary-label methods, particularly as the label-set size shrinks. Collectively, the work provides both theoretical guarantees and practical evidence that stochastic labeling can drastically reduce labeling effort without sacrificing accuracy.

Abstract

Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we design a novel labeling mechanism called stochastic label. In this setting, stochastic label includes two cases: 1) identify a correct class label from a small number of randomly given labels; 2) annotate the instance with None label when given labels do not contain correct class label. In this paper, we propose a novel suitable approach to learn from these stochastic labels. We obtain an unbiased estimator that utilizes less supervised information in stochastic labels to train a multi-class classifier. Additionally, it is theoretically justifiable by deriving the estimation error bound of the proposed method. Finally, we conduct extensive experiments on widely-used benchmark datasets to validate the superiority of our method by comparing it with existing state-of-the-art methods.

Learning from Stochastic Labels

TL;DR

The paper introduces stochastic labels as a cost-saving labeling scheme for multiclass classification, where a ground-truth label is sought from a small random subset or a None annotation is provided. It develops an unbiased risk estimator that splits risk across observed and None-labeled cases, and establishes a formal estimation-error bound demonstrating convergence with increasing data. The approach is validated on multiple benchmarks (MNIST, Fashion, Kuzushiji, CIFAR-10), showing competitive or superior performance relative to weakly supervised and ordinary-label methods, particularly as the label-set size shrinks. Collectively, the work provides both theoretical guarantees and practical evidence that stochastic labeling can drastically reduce labeling effort without sacrificing accuracy.

Abstract

Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we design a novel labeling mechanism called stochastic label. In this setting, stochastic label includes two cases: 1) identify a correct class label from a small number of randomly given labels; 2) annotate the instance with None label when given labels do not contain correct class label. In this paper, we propose a novel suitable approach to learn from these stochastic labels. We obtain an unbiased estimator that utilizes less supervised information in stochastic labels to train a multi-class classifier. Additionally, it is theoretically justifiable by deriving the estimation error bound of the proposed method. Finally, we conduct extensive experiments on widely-used benchmark datasets to validate the superiority of our method by comparing it with existing state-of-the-art methods.
Paper Structure (16 sections, 4 theorems, 18 equations, 2 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 4 theorems, 18 equations, 2 figures, 5 tables, 1 algorithm.

Key Result

Theorem 3.1

For any instance $\textbf{x}$ with its ground-truth label $y$ and stochastic labels set $\tilde{Y}$, the following equality holds:

Figures (2)

  • Figure 1: A comparison between ordinary label (left) and stochastic label (right). Here, the selected label is ticked. For the same instance, in ordinary label, crowdsourced workers need to identify the correct class label from 10 classes. However, in stochastic label, they only need to select from 3 stochastic labels: Dog, Monkey, Panda, or annotate None.
  • Figure 2: Experiments results of test classification accuracy of various datasets. The dark colors show the mean accuracy of 5 trials and the light colors show the standard deviation.

Theorems & Definitions (4)

  • Theorem 3.1
  • Theorem 3.2
  • Lemma 3.3
  • Theorem 3.4