Automated Machine Learning for Positive-Unlabelled Learning
Jack D. Saunders, Alex A. Freitas
TL;DR
This work addresses the challenge of selecting effective PU learning strategies by introducing two Auto-ML systems, BO-Auto-PU and EBO-Auto-PU, that complement the original GA-Auto-PU. Both new systems leverage Bayesian optimization and a surrogate-assisted evolutionary approach to efficiently search a two-step PU framework across base and spy-based extended spaces. Through extensive experiments on 60 engineered PU datasets derived from biomedical sources, the Auto-PU methods generally outperform strong baselines in F-measure and precision, with EBO-Auto-PU offering a favorable balance of predictive performance and computational efficiency. The results highlight the value of Auto-ML in PU learning and point to future work on expanded search spaces and joint hyperparameter optimization of the Auto-PU systems themselves.
Abstract
Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a given PU learning task presents a challenge. Our previous work has addressed this by proposing GA-Auto-PU, the first Automated Machine Learning (Auto-ML) system for PU learning. In this work, we propose two new Auto-ML systems for PU learning: BO-Auto-PU, based on a Bayesian Optimisation approach, and EBO-Auto-PU, based on a novel evolutionary/Bayesian optimisation approach. We also present an extensive evaluation of the three Auto-ML systems, comparing them to each other and to well-established PU learning methods across 60 datasets (20 real-world datasets, each with 3 versions in terms of PU learning characteristics).
