Bayesian Active Learning for Classification and Preference Learning
Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, Máté Lengyel
TL;DR
The paper introduces BALD, a Bayesian information-theoretic active learning objective, and shows how to apply it with Gaussian Process classifiers by reformulating entropy-based gains in the output space. It derives analytic, near-exact expressions for the BALD criterion under probit likelihoods and demonstrates how to extend the approach to GP-based preference learning via a difference kernel. Empirical results on classification and preference tasks indicate BALD often outperforms other active-learning methods while maintaining low computational complexity, and the method remains agnostic to the underlying approximate inference technique. The work also discusses hyperparameter learning within BALD and situates the approach relative to related methodologies, highlighting practical advantages for nonparametric models.
Abstract
Information theoretic active learning has been widely studied for probabilistic models. For simple regression an optimal myopic policy is easily tractable. However, for other tasks and with more complex models, such as classification with nonparametric models, the optimal solution is harder to compute. Current approaches make approximations to achieve tractability. We propose an approach that expresses information gain in terms of predictive entropies, and apply this method to the Gaussian Process Classifier (GPC). Our approach makes minimal approximations to the full information theoretic objective. Our experimental performance compares favourably to many popular active learning algorithms, and has equal or lower computational complexity. We compare well to decision theoretic approaches also, which are privy to more information and require much more computational time. Secondly, by developing further a reformulation of binary preference learning to a classification problem, we extend our algorithm to Gaussian Process preference learning.
