Federated Learning with Only Positive Labels by Exploring Label Correlations
Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao
TL;DR
This work tackles federated multi-label classification where each client provides only positive data for a single label, a setting prone to embedding collapse. It introduces FedALC, a server-side correlation-regularized framework that harnesses label co-occurrence to separate uncorrelated labels while keeping correlated ones close, and a privacy-preserving label-collection scheme. A communication-efficient variant, FedALC-fixed, learns a fixed class-embedding matrix upfront to minimize data transfer and privacy risks. Across image and extreme multi-label text datasets, FedALC and its fixed variant achieve substantial improvements over prior methods (e.g., up to 8.17% MAP on VOC 2012 and 19.3% P@1 on Bibtex), validating the benefit of incorporating label correlations in FL with positive labels.
Abstract
Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue can be addressed by adding a specially designed regularizer on the server-side. Although effective sometimes, the label correlations are simply ignored and thus sub-optimal performance may be obtained. Besides, it is expensive and unsafe to exchange user's private embeddings between server and clients frequently, especially when training model in the contrastive way. To remedy these drawbacks, we propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC). Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training. To further improve the safety and also reduce the communication overhead, we propose a variant to learn fixed class embedding for each client, so that the server and clients only need to exchange class embeddings once. Extensive experiments on multiple popular datasets demonstrate that our FedALC can significantly outperform existing counterparts.
