Table of Contents
Fetching ...

Automated Machine Learning for Positive-Unlabelled Learning

Jack D. Saunders, Alex A. Freitas

TL;DR

This work addresses the challenge of selecting effective PU learning strategies by introducing two Auto-ML systems, BO-Auto-PU and EBO-Auto-PU, that complement the original GA-Auto-PU. Both new systems leverage Bayesian optimization and a surrogate-assisted evolutionary approach to efficiently search a two-step PU framework across base and spy-based extended spaces. Through extensive experiments on 60 engineered PU datasets derived from biomedical sources, the Auto-PU methods generally outperform strong baselines in F-measure and precision, with EBO-Auto-PU offering a favorable balance of predictive performance and computational efficiency. The results highlight the value of Auto-ML in PU learning and point to future work on expanded search spaces and joint hyperparameter optimization of the Auto-PU systems themselves.

Abstract

Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a given PU learning task presents a challenge. Our previous work has addressed this by proposing GA-Auto-PU, the first Automated Machine Learning (Auto-ML) system for PU learning. In this work, we propose two new Auto-ML systems for PU learning: BO-Auto-PU, based on a Bayesian Optimisation approach, and EBO-Auto-PU, based on a novel evolutionary/Bayesian optimisation approach. We also present an extensive evaluation of the three Auto-ML systems, comparing them to each other and to well-established PU learning methods across 60 datasets (20 real-world datasets, each with 3 versions in terms of PU learning characteristics).

Automated Machine Learning for Positive-Unlabelled Learning

TL;DR

This work addresses the challenge of selecting effective PU learning strategies by introducing two Auto-ML systems, BO-Auto-PU and EBO-Auto-PU, that complement the original GA-Auto-PU. Both new systems leverage Bayesian optimization and a surrogate-assisted evolutionary approach to efficiently search a two-step PU framework across base and spy-based extended spaces. Through extensive experiments on 60 engineered PU datasets derived from biomedical sources, the Auto-PU methods generally outperform strong baselines in F-measure and precision, with EBO-Auto-PU offering a favorable balance of predictive performance and computational efficiency. The results highlight the value of Auto-ML in PU learning and point to future work on expanded search spaces and joint hyperparameter optimization of the Auto-PU systems themselves.

Abstract

Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a given PU learning task presents a challenge. Our previous work has addressed this by proposing GA-Auto-PU, the first Automated Machine Learning (Auto-ML) system for PU learning. In this work, we propose two new Auto-ML systems for PU learning: BO-Auto-PU, based on a Bayesian Optimisation approach, and EBO-Auto-PU, based on a novel evolutionary/Bayesian optimisation approach. We also present an extensive evaluation of the three Auto-ML systems, comparing them to each other and to well-established PU learning methods across 60 datasets (20 real-world datasets, each with 3 versions in terms of PU learning characteristics).
Paper Structure (28 sections, 3 equations, 4 figures, 16 tables, 3 algorithms)

This paper contains 28 sections, 3 equations, 4 figures, 16 tables, 3 algorithms.

Figures (4)

  • Figure 1: Example of a candidate solution in the base search space.
  • Figure 2: Example of a candidate solution in the extended search space.
  • Figure 3: Average F-measure results comparison for three versions of Auto-PU utilising the base search space and the PU learning baselines across the three values of $\delta$.
  • Figure 4: Average F-measure results comparison for three versions of Auto-PU utilising the base search space and the PU learning baselines across the three values of $\delta$.