CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models

Abhishek Singh; Gauri Gupta; Ritvik Kapila; Yichuan Shi; Alex Dang; Sheshank Shankar; Mohammed Ehab; Ramesh Raskar

CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models

Abhishek Singh, Gauri Gupta, Ritvik Kapila, Yichuan Shi, Alex Dang, Sheshank Shankar, Mohammed Ehab, Ramesh Raskar

TL;DR

CoDream introduces a novel federated framework that exchanges dreams—randomly initialized data points optimized in input space—to capture the global data distribution without sharing raw data or model parameters. It enables model-agnostic collaboration by performing knowledge extraction, collaborative dreaming, and knowledge acquisition entirely in the data space, with gradients aggregated linearly to preserve secure aggregation properties. The approach combines entropy-based local dreaming, regularized dream optimization, and a knowledge-distillation-based acquisition stage, achieving competitive accuracy on MNIST, SVHN, and CIFAR-10 under both IID and non-IID settings with heterogeneous client models. Empirical results demonstrate CoDream’s scalability, robustness to heterogeneity, and significant reduction in model-size-dependent communication, suggesting practical applicability for privacy-preserving, cross-architecture federated learning.

Abstract

Federated Learning (FL) enables collaborative optimization of machine learning models across decentralized data by aggregating model parameters. Our approach extends this concept by aggregating "knowledge" derived from models, instead of model parameters. We present a novel framework called CoDream, where clients collaboratively optimize randomly initialized data using federated optimization in the input data space, similar to how randomly initialized model parameters are optimized in FL. Our key insight is that jointly optimizing this data can effectively capture the properties of the global data distribution. Sharing knowledge in data space offers numerous benefits: (1) model-agnostic collaborative learning, i.e., different clients can have different model architectures; (2) communication that is independent of the model size, eliminating scalability concerns with model parameters; (3) compatibility with secure aggregation, thus preserving the privacy benefits of federated learning; (4) allowing of adaptive optimization of knowledge shared for personalized learning. We empirically validate CoDream on standard FL tasks, demonstrating competitive performance despite not sharing model parameters. Our code: https://mitmedialab.github.io/codream.github.io/

CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models

TL;DR

Abstract

Paper Structure (32 sections, 9 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 32 sections, 9 equations, 12 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
CoDream
Local dreaming for extracting knowledge from models
Collaborative dreaming for knowledge aggregation
Knowledge acquisition
Analysis of CoDream
Experiments
Fast dreaming for knowledge extraction
Real-world datasets/comparison with FL
Flexibility of models: Model-agnostic
Varying number of clients
Analysis of sample complexity of dreams
Validating knowledge-extraction based on Eq \ref{['eq:knowledge_extraction']}
...and 17 more sections

Figures (12)

Figure 1: Landscape of FL techniques. Here we use Fed.-Federated, Gen.-Generative, Syn.-Synthetic, Pred.-Predictive, Comm.-Communication, Comp.-Computation, Het.-Heterogeneous, Agg.-Aggregation. By levels of privacy, we mean how distant the shared updates are from raw data. Sharing synthetic data and dreams are two levels of indirection away from the raw data than sharing models.
Figure 2: Overview of the CoDream pipeline comprising three stages: (1) Knowledge Extraction— each client generates dreams, representing the extracted knowledge from their local models (teacher). Starting with random noise images and frozen teacher models, clients optimize to reduce entropy on the output distribution while regularizing the batch norm and adaptive loss. The clients share their local updates of dreams and logits with the server. (2) Knowledge Aggregation—server aggregates dreams and soft labels from clients to construct a CoDream dataset. (3) Knowledge Acquisition—clients update their local models through two-stage training (i) on jointly optimized co-dreams with knowledge distillation (where clients act as students) and (ii) local dataset with cross-entropy loss.
Figure 3: Comparing aggregation framework in FL and CoDream. In FL, the server aggregates the gradients of model parameters, whereas, in CoDream, aggregation happens in the gradients of the data space, called dreams ($\hat{x}$), allowing for different model architectures. Here $K$ is the number of clients and $l, \tilde{l}$ are loss functions given in Eq \ref{['eq:fl']} and Eq \ref{['eq:knowledge_extraction']}.
Figure 4: Comparison by varying the number of clients. The performance gap widens between CoDream and independent optimization as we increase the number of clients.
Figure 5: Sample complexity of generated dreams for effective knowledge transfer
...and 7 more figures

CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models

TL;DR

Abstract

CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models

Authors

TL;DR

Abstract

Table of Contents

Figures (12)