Meta-learning for Positive-unlabeled Classification
Atsutoshi Kumagai, Tomoharu Iwata, Yasuhiro Fujiwara
TL;DR
This work tackles positive-unlabeled classification under data-scarce, cross-task settings by introducing a meta-learning framework that adapts to PU data on unseen tasks. The key idea is to estimate the Bayes optimal classifier through a task-specific density-ratio between PU and marginal densities and a positive class-prior, where task representations from permutation-invariant networks enable flexible, task-conditioned embeddings. The adaptation admits a closed-form solution for the density-ratio parameters, enabling efficient meta-training that minimizes the test classification risk across related tasks. Empirical results on synthetic and real datasets show the proposed method outperforming standard PU methods and PU-aware meta-learning baselines, with robust performance even when the target priors are unknown. The approach holds potential for rapid, data-efficient PU learning in applications like outlier detection, information retrieval, and personalized systems.
Abstract
We propose a meta-learning method for positive and unlabeled (PU) classification, which improves the performance of binary classifiers obtained from only PU data in unseen target tasks. PU learning is an important problem since PU data naturally arise in real-world applications such as outlier detection and information retrieval. Existing PU learning methods require many PU data, but sufficient data are often unavailable in practice. The proposed method minimizes the test classification risk after the model is adapted to PU data by using related tasks that consist of positive, negative, and unlabeled data. We formulate the adaptation as an estimation problem of the Bayes optimal classifier, which is an optimal classifier to minimize the classification risk. The proposed method embeds each instance into a task-specific space using neural networks. With the embedded PU data, the Bayes optimal classifier is estimated through density-ratio estimation of PU densities, whose solution is obtained as a closed-form solution. The closed-form solution enables us to efficiently and effectively minimize the test classification risk. We empirically show that the proposed method outperforms existing methods with one synthetic and three real-world datasets.
