Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data

Jialei Liu; Jun Liao; Kuangnan Fang

Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data

Jialei Liu, Jun Liao, Kuangnan Fang

TL;DR

The paper tackles positive-unlabeled (PU) learning under privacy constraints by leveraging information from multiple heterogeneous source domains. It introduces TLMA-PU, a framework that fits domain-specific logistic regression models for fully labeled, semi-supervised, and PU sources and then transfers knowledge through a weighted average of parameters, with weights optimized by cross-validated KL-divergence criteria. The authors establish asymptotic weight optimality under misspecification, and weight convergence when the target model is correctly specified, including extensions to high-dimensional sparse settings. Empirical results from simulations and a real credit-risk dataset show that TLMA-PU improves predictive accuracy and robustness, especially with limited labeled target data and diverse data sources. The approach preserves privacy by sharing only parameter vectors and offers a principled, theoretically grounded solution for cross-domain PU learning in practical risk-control tasks.

Abstract

Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with model averaging framework that integrates information from heterogeneous data sources - including fully binary labeled, semi-supervised, and PU data sets - without direct data sharing. For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging. Optimal weights for combining source models are determined via a cross-validation criterion that minimizes the Kullback-Leibler divergence. We establish theoretical guarantees for weight optimality and convergence, covering both misspecified and correctly specified target models, with further extensions to high-dimensional settings using sparsity-penalized estimators. Extensive simulations and real-world credit risk data analyses demonstrate that our method outperforms other comparative methods in terms of predictive accuracy and robustness, especially under limited labeled data and heterogeneous environments.

Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data

TL;DR

Abstract

Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (10)