Table of Contents
Fetching ...

Evidential uncertainty sampling for active learning

Arthur Hoarau, Vincent Lemaire, Arnaud Martin, Jean-Christophe Dubois, Yolande Le Gall

TL;DR

This work tackles active learning by incorporating label (oracle) uncertainty through belief-function theory. It introduces two sampling strategies: Klir uncertainty, which combines discord and non-specificity to capture informative, imprecise labels, and an evidential extension of epistemic uncertainty applicable to multiple classes, computable from model outputs without direct reliance on observations. By leveraging rich labels and mass-function outputs, the methods explicitly separate reducible (epistemic) and irreducible (aleatoric) uncertainty, addressing the exploration–exploitation dilemma via a tunable parameter $\lambda$. Experiments on real-world rich-label data (Credal Dog-2) and standard AL benchmarks show competitive or superior performance to traditional uncertainty sampling, with notable reductions in labeling costs. The framework enables more nuanced uncertainty representation and data-efficient learning, with future work on dynamic exploration–exploitation control and wider adoption of evidential models.

Abstract

Recent studies in active learning, particularly in uncertainty sampling, have focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, the aim is to simplify the computational process while eliminating the dependence on observations. Crucially, the inherent uncertainty in the labels is considered, the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which tackles the exploration-exploitation dilemma, and sampling by evidential epistemic uncertainty, which extends the concept of reducible uncertainty within the evidential framework, both using the theory of belief functions. Experimental results in active learning demonstrate that our proposed method can outperform uncertainty sampling.

Evidential uncertainty sampling for active learning

TL;DR

This work tackles active learning by incorporating label (oracle) uncertainty through belief-function theory. It introduces two sampling strategies: Klir uncertainty, which combines discord and non-specificity to capture informative, imprecise labels, and an evidential extension of epistemic uncertainty applicable to multiple classes, computable from model outputs without direct reliance on observations. By leveraging rich labels and mass-function outputs, the methods explicitly separate reducible (epistemic) and irreducible (aleatoric) uncertainty, addressing the exploration–exploitation dilemma via a tunable parameter . Experiments on real-world rich-label data (Credal Dog-2) and standard AL benchmarks show competitive or superior performance to traditional uncertainty sampling, with notable reductions in labeling costs. The framework enables more nuanced uncertainty representation and data-efficient learning, with future work on dynamic exploration–exploitation control and wider adoption of evidential models.

Abstract

Recent studies in active learning, particularly in uncertainty sampling, have focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, the aim is to simplify the computational process while eliminating the dependence on observations. Crucially, the inherent uncertainty in the labels is considered, the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which tackles the exploration-exploitation dilemma, and sampling by evidential epistemic uncertainty, which extends the concept of reducible uncertainty within the evidential framework, both using the theory of belief functions. Experimental results in active learning demonstrate that our proposed method can outperform uncertainty sampling.
Paper Structure (17 sections, 19 equations, 13 figures, 2 tables)

This paper contains 17 sections, 19 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Illustrating the exploration-exploitation dilemma in active learning: complete dataset vs. active learning iterations.
  • Figure 2: Visualization of uncertainty areas in two-dimensional datasets.
  • Figure 3: Illustration of reducible and irreducible uncertainties in a coin toss experiment (and a Finnish word representation).
  • Figure 4: Visualization of model uncertainty and sample evolution in two-class datasets.
  • Figure 5: Representation of aleatoric and epistemic uncertainties in model predictions according to Fig. \ref{['subfig:log']}.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Example 1
  • Example 2
  • Example 3