Improving Active Learning with a Bayesian Representation of Epistemic Uncertainty
Jake Thomas, Jeremie Houssineau
TL;DR
This work tackles the challenge of reducing epistemic uncertainty (EU) in active learning by moving beyond purely probabilistic uncertainty representations and adopting a hybrid probabilistic-possibilistic framework. It introduces outer probability measures (OPMs) and possibility functions to explicitly model EU, and defines a possibilistic Gaussian process (PGP) to enable Bayesian-like inference within this framework. Two local, update-free acquisition strategies are developed—one based on a novel EU measure and another on the necessity of correct classification—together with probabilistic class-conditional updates via Laplace approximations. Empirical results on synthetic and real datasets demonstrate that the proposed approaches yield strong performance across binary and multiclass classification tasks, often outperforming standard baselines while maintaining computational efficiency, thereby offering a robust alternative for uncertainty-aware active learning.
Abstract
A popular strategy for active learning is to specifically target a reduction in epistemic uncertainty, since aleatoric uncertainty is often considered as being intrinsic to the system of interest and therefore not reducible. Yet, distinguishing these two types of uncertainty remains challenging and there is no single strategy that consistently outperforms the others. We propose to use a particular combination of probability and possibility theories, with the aim of using the latter to specifically represent epistemic uncertainty, and we show how this combination leads to new active learning strategies that have desirable properties. In order to demonstrate the efficiency of these strategies in non-trivial settings, we introduce the notion of a possibilistic Gaussian process (GP) and consider GP-based multiclass and binary classification problems, for which the proposed methods display a strong performance for both simulated and real datasets.
