FedCDC: A Collaborative Framework for Data Consumers in Federated Learning Market
Zhuan Shi, Patrick Ohl, Boi Faltings
TL;DR
We address data scarcity and competition in Federated Learning markets by enabling DCs to collaborate on shared subtasks. FedCDC forms alliances through a weighted max-clique optimization solved with MaxSAT and trains a submodel for the shared subtask, which is integrated into each DC’s global model via ensemble distillation using entropy-weighted teacher logits and a distillation loss that combines $D_{KL}$ divergence with cross-entropy. The approach demonstrates that restricting access to DOs harms performance, but alliance-based collaboration recovers much of the lost accuracy across FMNIST, CIFAR-10, and CIFAR-100, highlighting improvements in data efficiency and market effectiveness. This framework broadens FL by enabling task modular collaboration, with potential extensions to recursive alliances, richer incentive mechanisms, and reduced dependence on public distillation data.
Abstract
Federated learning (FL) allows machine learning models to be trained on distributed datasets without directly accessing local data. In FL markets, numerous Data Consumers compete to recruit Data Owners for their respective training tasks, but budget constraints and competition can prevent them from securing sufficient data. While existing solutions focus on optimizing one-to-one matching between Data Owners and Data Consumers, we propose \methodname{}, a novel framework that facilitates collaborative recruitment and training for Data Consumers with similar tasks. Specifically, \methodname{} detects shared subtasks among multiple Data Consumers and coordinates the joint training of submodels specialized for these subtasks. Then, through ensemble distillation, these submodels are integrated into each Data Consumer global model. Experimental evaluations on three benchmark datasets demonstrate that restricting Data Consumers access to Data Owners significantly degrades model performance; however, by incorporating \methodname{}, this performance loss is effectively mitigated, resulting in substantial accuracy gains for all participating Data Consumers.
