Table of Contents
Fetching ...

Prediction-space knowledge markets for communication-efficient federated learning on multimedia tasks

Wenzhang Du

TL;DR

KTA v2 introduces a prediction-space knowledge trading market for federated learning that personalizes knowledge transfer by constructing per-client teachers from predictions on a public reference set. By exchanging logits rather than full model parameters and weighting neighbor contributions by similarity and reference accuracy, it achieves strong accuracy under aggressive communication budgets and shows robustness to non-IID data, including in large-model settings. The approach can be interpreted as prediction-space regularization, mitigating client drift while maintaining architecture flexibility. Empirical results across FEMNIST, CIFAR-10, and AG News demonstrate substantial communication savings with competitive or superior performance compared to parameter-based methods and FedMD, highlighting practical impact for multimedia FL scenarios with heterogeneous clients.

Abstract

Federated learning (FL) enables collaborative training over distributed multimedia data but suffers acutely from statistical heterogeneity and communication constraints, especially when clients deploy large models. Classic parameter-averaging methods such as FedAvg transmit full model weights and can diverge under nonindependent and identically distributed (non-IID) data. We propose KTA v2, a prediction-space knowledge trading market for FL. Each round, clients locally train on their private data, then share only logits on a small public reference set. The server constructs a client-client similarity graph in prediction space, combines it with reference-set accuracy to form per-client teacher ensembles, and sends back personalized soft targets for a second-stage distillation update. This two-stage procedure can be interpreted as approximate block-coordinate descent on a unified objective with prediction-space regularization. Experiments on FEMNIST, CIFAR-10 and AG News show that, under comparable or much lower communication budgets, KTA v2 consistently outperforms a local-only baseline and strong parameter-based methods (FedAvg, FedProx), and substantially improves over a FedMD-style global teacher. On CIFAR-10 with ResNet-18, KTA v2 reaches 57.7% test accuracy using approximately 1/1100 of FedAvg's communication, while on AG News it attains 89.3% accuracy with approximately 1/300 of FedAvg's traffic.

Prediction-space knowledge markets for communication-efficient federated learning on multimedia tasks

TL;DR

KTA v2 introduces a prediction-space knowledge trading market for federated learning that personalizes knowledge transfer by constructing per-client teachers from predictions on a public reference set. By exchanging logits rather than full model parameters and weighting neighbor contributions by similarity and reference accuracy, it achieves strong accuracy under aggressive communication budgets and shows robustness to non-IID data, including in large-model settings. The approach can be interpreted as prediction-space regularization, mitigating client drift while maintaining architecture flexibility. Empirical results across FEMNIST, CIFAR-10, and AG News demonstrate substantial communication savings with competitive or superior performance compared to parameter-based methods and FedMD, highlighting practical impact for multimedia FL scenarios with heterogeneous clients.

Abstract

Federated learning (FL) enables collaborative training over distributed multimedia data but suffers acutely from statistical heterogeneity and communication constraints, especially when clients deploy large models. Classic parameter-averaging methods such as FedAvg transmit full model weights and can diverge under nonindependent and identically distributed (non-IID) data. We propose KTA v2, a prediction-space knowledge trading market for FL. Each round, clients locally train on their private data, then share only logits on a small public reference set. The server constructs a client-client similarity graph in prediction space, combines it with reference-set accuracy to form per-client teacher ensembles, and sends back personalized soft targets for a second-stage distillation update. This two-stage procedure can be interpreted as approximate block-coordinate descent on a unified objective with prediction-space regularization. Experiments on FEMNIST, CIFAR-10 and AG News show that, under comparable or much lower communication budgets, KTA v2 consistently outperforms a local-only baseline and strong parameter-based methods (FedAvg, FedProx), and substantially improves over a FedMD-style global teacher. On CIFAR-10 with ResNet-18, KTA v2 reaches 57.7% test accuracy using approximately 1/1100 of FedAvg's communication, while on AG News it attains 89.3% accuracy with approximately 1/300 of FedAvg's traffic.

Paper Structure

This paper contains 27 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: CIFAR-10 (SimpleCNN) accuracy versus communication for Local, FedAvg, FedProx and KTA v2 at $\alpha=0.5$; KTA v2 stays within 0--8 MB.
  • Figure 2: CIFAR-10 + ResNet-18 communication/accuracy trajectory. KTA v2 reaches 57.7% with $\approx$3.8 MB, while FedAvg attains 42.1% at $\approx$4265.5 MB.
  • Figure 3: CIFAR-10 non-IID sweep (SimpleCNN). Test accuracy after 10 rounds under Dirichlet $\alpha$; smaller $\alpha$ means stronger label skew.