Table of Contents
Fetching ...

FedCDC: A Collaborative Framework for Data Consumers in Federated Learning Market

Zhuan Shi, Patrick Ohl, Boi Faltings

TL;DR

We address data scarcity and competition in Federated Learning markets by enabling DCs to collaborate on shared subtasks. FedCDC forms alliances through a weighted max-clique optimization solved with MaxSAT and trains a submodel for the shared subtask, which is integrated into each DC’s global model via ensemble distillation using entropy-weighted teacher logits and a distillation loss that combines $D_{KL}$ divergence with cross-entropy. The approach demonstrates that restricting access to DOs harms performance, but alliance-based collaboration recovers much of the lost accuracy across FMNIST, CIFAR-10, and CIFAR-100, highlighting improvements in data efficiency and market effectiveness. This framework broadens FL by enabling task modular collaboration, with potential extensions to recursive alliances, richer incentive mechanisms, and reduced dependence on public distillation data.

Abstract

Federated learning (FL) allows machine learning models to be trained on distributed datasets without directly accessing local data. In FL markets, numerous Data Consumers compete to recruit Data Owners for their respective training tasks, but budget constraints and competition can prevent them from securing sufficient data. While existing solutions focus on optimizing one-to-one matching between Data Owners and Data Consumers, we propose \methodname{}, a novel framework that facilitates collaborative recruitment and training for Data Consumers with similar tasks. Specifically, \methodname{} detects shared subtasks among multiple Data Consumers and coordinates the joint training of submodels specialized for these subtasks. Then, through ensemble distillation, these submodels are integrated into each Data Consumer global model. Experimental evaluations on three benchmark datasets demonstrate that restricting Data Consumers access to Data Owners significantly degrades model performance; however, by incorporating \methodname{}, this performance loss is effectively mitigated, resulting in substantial accuracy gains for all participating Data Consumers.

FedCDC: A Collaborative Framework for Data Consumers in Federated Learning Market

TL;DR

We address data scarcity and competition in Federated Learning markets by enabling DCs to collaborate on shared subtasks. FedCDC forms alliances through a weighted max-clique optimization solved with MaxSAT and trains a submodel for the shared subtask, which is integrated into each DC’s global model via ensemble distillation using entropy-weighted teacher logits and a distillation loss that combines divergence with cross-entropy. The approach demonstrates that restricting access to DOs harms performance, but alliance-based collaboration recovers much of the lost accuracy across FMNIST, CIFAR-10, and CIFAR-100, highlighting improvements in data efficiency and market effectiveness. This framework broadens FL by enabling task modular collaboration, with potential extensions to recursive alliances, richer incentive mechanisms, and reduced dependence on public distillation data.

Abstract

Federated learning (FL) allows machine learning models to be trained on distributed datasets without directly accessing local data. In FL markets, numerous Data Consumers compete to recruit Data Owners for their respective training tasks, but budget constraints and competition can prevent them from securing sufficient data. While existing solutions focus on optimizing one-to-one matching between Data Owners and Data Consumers, we propose \methodname{}, a novel framework that facilitates collaborative recruitment and training for Data Consumers with similar tasks. Specifically, \methodname{} detects shared subtasks among multiple Data Consumers and coordinates the joint training of submodels specialized for these subtasks. Then, through ensemble distillation, these submodels are integrated into each Data Consumer global model. Experimental evaluations on three benchmark datasets demonstrate that restricting Data Consumers access to Data Owners significantly degrades model performance; however, by incorporating \methodname{}, this performance loss is effectively mitigated, resulting in substantial accuracy gains for all participating Data Consumers.

Paper Structure

This paper contains 21 sections, 10 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: Two DCs with overlapping target domains. Both DCs want to classify dogs and cats, so they will both try to recruit the red DOs who hold dog and cat data.
  • Figure 2: A DC-DO matching in a FL market with 2 DCs and 12 DOs. Both DCs try to recruit the red DOs, but due to limited computational resources, a DO can only be recruited by one DC. Therefore, the DCs fail to recruit all the DOs they want.
  • Figure 3: An overview of how FedCDC enables collaboration in a simple setting with 2 Data Consumers. The framework detects that the DCs share the subtask of classifying dogs and cats, so it proposes to create an alliance. Both DCs accept, so an artificial DC is created which will be trained on the shared subtask. Its knowledge is then distilled into the models of the two DCs using ensemble distillation.
  • Figure 4: We find an optimal combination of compatible alliances by solving a weighted max-clique problem. Each node corresponds to an alliance and its weight corresponds to the estimated value of the alliance. There exists an edge between every pair of compatible alliances. We therefore find the combination of alliances with the optimal total value.
  • Figure 5: A more complex collaboration pattern as it may emerge in an FL Market. 4 Data Consumers want to classify different sets of classes. FedCDC detects 4 separate subtasks and creates corresponding alliances. The alliances train their own models, which are then provided to the alliance participants.
  • ...and 3 more figures