Table of Contents
Fetching ...

Federated Learning with Only Positive Labels by Exploring Label Correlations

Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

TL;DR

This work tackles federated multi-label classification where each client provides only positive data for a single label, a setting prone to embedding collapse. It introduces FedALC, a server-side correlation-regularized framework that harnesses label co-occurrence to separate uncorrelated labels while keeping correlated ones close, and a privacy-preserving label-collection scheme. A communication-efficient variant, FedALC-fixed, learns a fixed class-embedding matrix upfront to minimize data transfer and privacy risks. Across image and extreme multi-label text datasets, FedALC and its fixed variant achieve substantial improvements over prior methods (e.g., up to 8.17% MAP on VOC 2012 and 19.3% P@1 on Bibtex), validating the benefit of incorporating label correlations in FL with positive labels.

Abstract

Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue can be addressed by adding a specially designed regularizer on the server-side. Although effective sometimes, the label correlations are simply ignored and thus sub-optimal performance may be obtained. Besides, it is expensive and unsafe to exchange user's private embeddings between server and clients frequently, especially when training model in the contrastive way. To remedy these drawbacks, we propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC). Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training. To further improve the safety and also reduce the communication overhead, we propose a variant to learn fixed class embedding for each client, so that the server and clients only need to exchange class embeddings once. Extensive experiments on multiple popular datasets demonstrate that our FedALC can significantly outperform existing counterparts.

Federated Learning with Only Positive Labels by Exploring Label Correlations

TL;DR

This work tackles federated multi-label classification where each client provides only positive data for a single label, a setting prone to embedding collapse. It introduces FedALC, a server-side correlation-regularized framework that harnesses label co-occurrence to separate uncorrelated labels while keeping correlated ones close, and a privacy-preserving label-collection scheme. A communication-efficient variant, FedALC-fixed, learns a fixed class-embedding matrix upfront to minimize data transfer and privacy risks. Across image and extreme multi-label text datasets, FedALC and its fixed variant achieve substantial improvements over prior methods (e.g., up to 8.17% MAP on VOC 2012 and 19.3% P@1 on Bibtex), validating the benefit of incorporating label correlations in FL with positive labels.

Abstract

Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue can be addressed by adding a specially designed regularizer on the server-side. Although effective sometimes, the label correlations are simply ignored and thus sub-optimal performance may be obtained. Besides, it is expensive and unsafe to exchange user's private embeddings between server and clients frequently, especially when training model in the contrastive way. To remedy these drawbacks, we propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC). Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training. To further improve the safety and also reduce the communication overhead, we propose a variant to learn fixed class embedding for each client, so that the server and clients only need to exchange class embeddings once. Extensive experiments on multiple popular datasets demonstrate that our FedALC can significantly outperform existing counterparts.
Paper Structure (20 sections, 16 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 20 sections, 16 equations, 8 figures, 7 tables, 2 algorithms.

Figures (8)

  • Figure 1: Overview of the proposed federated averaging by exploring label correlations (FedALC) method. (1) Our FedALC computes gradients for parameter updating and hash code for each instance locally. The instance hash code is utilized for calculating label correlations and only need transmission once; (2) The client sends the locally updated model parameters, class embedding, and hash codes to the server; On the server, (3) the global model is obtained via parameter aggregation; (4) the different class embeddings are merged as a matrix, and label distribution is obtained by comparing the hash codes; (5) The server then utilizes our designed correlation regularizer based on the label distribution to optimize the class embedding matrix; (6) and eventually transmits global model parameters and corresponding class embeddings to different clients.
  • Figure 2: Mechanism of the designed correlation regularizer under the federated learning paradigm. The orange, blue and green dots indicate the instance, positive class and negative class respectively.
  • Figure 3: FedALC with a fixed class embedding matrix. On the server side, positive class embeddings are enforced to be close and separated from negative class embeddings. Then the fixed class embeddings are obtained; On the client side, the instance are enforced to approach its positive classes.
  • Figure 4: Label sets collection for each instance in the federated setting. Embeddings of the same data point on different clients are mapped as hash messages, which are sent to the server together with the label information. Then the server compare the messages and aggregate the labels that have the same message.
  • Figure 5: Visualization of number heterogeneity among users on EURlex dataset.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 1