Learning from Stochastic Labels
Meng Wei, Zhongnian Li, Yong Zhou, Qiaoyu Guo, Xinzheng Xu
TL;DR
The paper introduces stochastic labels as a cost-saving labeling scheme for multiclass classification, where a ground-truth label is sought from a small random subset or a None annotation is provided. It develops an unbiased risk estimator that splits risk across observed and None-labeled cases, and establishes a formal estimation-error bound demonstrating convergence with increasing data. The approach is validated on multiple benchmarks (MNIST, Fashion, Kuzushiji, CIFAR-10), showing competitive or superior performance relative to weakly supervised and ordinary-label methods, particularly as the label-set size shrinks. Collectively, the work provides both theoretical guarantees and practical evidence that stochastic labeling can drastically reduce labeling effort without sacrificing accuracy.
Abstract
Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we design a novel labeling mechanism called stochastic label. In this setting, stochastic label includes two cases: 1) identify a correct class label from a small number of randomly given labels; 2) annotate the instance with None label when given labels do not contain correct class label. In this paper, we propose a novel suitable approach to learn from these stochastic labels. We obtain an unbiased estimator that utilizes less supervised information in stochastic labels to train a multi-class classifier. Additionally, it is theoretically justifiable by deriving the estimation error bound of the proposed method. Finally, we conduct extensive experiments on widely-used benchmark datasets to validate the superiority of our method by comparing it with existing state-of-the-art methods.
