Collaborative Active Learning in Conditional Trust Environment
Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng
TL;DR
The paper tackles privacy-preserving collaborative active learning in a conditional trust setting where participants forbid data and model sharing. It introduces the Conditionally Collaborative Active Learning (C2AL) framework, enabling prediction-result exchange at Level-1 and Level-2 while a coordinator selects labels to query, with an emphasis on monotonic improvements from added labels and ensemble adaptation. Through simulations on synthetic data, the authors show that collaborative learning substantially outperforms independent efforts, e.g., improving $AUC$ from roughly $0.50$–$0.59$ in a lone learner to about $0.80$–$0.85$ in a four-collaborator setup, with shared predictions shaping feature importance. The work demonstrates the practical viability of privacy- and cost-conscious collaboration in active learning and lays a foundation for extending C2AL to real-world domains with strict data and confidentiality constraints.
Abstract
In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses privacy and security concerns by eliminating the need for direct model and data disclosure; (b) it enables the use of different data sources and insights without direct data exchange; and (c) it promotes cost-effectiveness and resource efficiency through shared labeling costs. To realize these benefits, we introduce a collaborative active learning framework designed to fulfill the aforementioned objectives. We validate the effectiveness of the proposed framework through simulations. The results demonstrate that collaboration leads to higher AUC scores compared to independent efforts, highlighting the framework's ability to overcome the limitations of individual models. These findings support the use of collaborative approaches in active learning, emphasizing their potential to enhance outcomes through collective expertise and shared resources. Our work provides a foundation for further research on collaborative active learning and its practical applications in various domains where data privacy, cost efficiency, and model performance are critical considerations.
