Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

Eduardo Fernandes Montesuma; Fabiola Espinoza Castellon; Fred Ngolè Mboula; Aurélien Mayoue; Antoine Souloumiac; Cédric Gouy-Pailler

Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

Eduardo Fernandes Montesuma, Fabiola Espinoza Castellon, Fred Ngolè Mboula, Aurélien Mayoue, Antoine Souloumiac, Cédric Gouy-Pailler

TL;DR

This work tackles privacy-aware decentralized MSDA by modeling cross-domain shifts in Wasserstein space. It expresses each domain's distribution as a Wasserstein barycenter $\mathcal{B}(\alpha_\ell;\mathcal{P})$ of shared atoms $\mathcal{P}$ with private barycentric coordinates $\alpha_\ell$, learned through a two-stage federated pipeline: FedAVG-based encoder training and decentralized dictionary learning. The approach yields two target adaptation modes, Reconstruction and Ensembling, and demonstrates superior performance over state-of-the-art decentralized MSDA methods across five visual benchmarks, along with improved robustness to client parallelism and reduced communication cost. The work also provides theoretical insight that the objective is locally quadratic under small perturbations, supporting stable federated optimization. Overall, it enables privacy-preserving, efficient domain adaptation in federated settings with strong empirical validation.

Abstract

Multi-Source Domain Adaptation (MSDA) is a challenging scenario where multiple related and heterogeneous source datasets must be adapted to an unlabeled target dataset. Conventional MSDA methods often overlook that data holders may have privacy concerns, hindering direct data sharing. In response, decentralized MSDA has emerged as a promising strategy to achieve adaptation without centralizing clients' data. Our work proposes a novel approach, Decentralized Dataset Dictionary Learning, to address this challenge. Our method leverages Wasserstein barycenters to model the distributional shift across multiple clients, enabling effective adaptation while preserving data privacy. Specifically, our algorithm expresses each client's underlying distribution as a Wasserstein barycenter of public atoms, weighted by private barycentric coordinates. Our approach ensures that the barycentric coordinates remain undisclosed throughout the adaptation process. Extensive experimentation across five visual domain adaptation benchmarks demonstrates the superiority of our strategy over existing decentralized MSDA techniques. Moreover, our method exhibits enhanced robustness to client parallelism while maintaining relative resilience compared to conventional decentralized MSDA methodologies.

Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

TL;DR

This work tackles privacy-aware decentralized MSDA by modeling cross-domain shifts in Wasserstein space. It expresses each domain's distribution as a Wasserstein barycenter

of shared atoms

with private barycentric coordinates

, learned through a two-stage federated pipeline: FedAVG-based encoder training and decentralized dictionary learning. The approach yields two target adaptation modes, Reconstruction and Ensembling, and demonstrates superior performance over state-of-the-art decentralized MSDA methods across five visual benchmarks, along with improved robustness to client parallelism and reduced communication cost. The work also provides theoretical insight that the objective is locally quadratic under small perturbations, supporting stable federated optimization. Overall, it enables privacy-preserving, efficient domain adaptation in federated settings with strong empirical validation.

Abstract

Paper Structure (14 sections, 2 theorems, 28 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 14 sections, 2 theorems, 28 equations, 5 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Proposed Approach
Background
Federated Learning an Encoder Network
Federated Dataset Dictionary Learning
Domain Adaptation
Experiments
Conclusion
Introduction
Proof of Theorem 3.1
Experiments
Reproducing the State-of-the-art
Hyper-Parameter Settings

Key Result

Theorem 3.1

Let $(\mathcal{P},\mathcal{A})$ be a dictionary, and $\epsilon \in \mathbb{R}^{d}$ be a random perturbation. Let $\tilde{\mathcal{P}} = \{\tilde{P}_{k}\}_{k=1}^{K}$ such that, then,

Figures (5)

Figure 1: Illustration of our decentralized strategy. (a) We fit a neural deep neural network composed of an encoder net $\phi$ and a classifier $h$, without centralizing client data. In principle, the target client does not participate at this step, unless some adaptation method is used (e.g., KD3A feng2021kd3a). (b) We do the adaptation step with features extracted from the fine-tuned source model, through our proposed .
Figure 2: Analysis $1-$dimensional (a) and $2-$dimensional (b) of 's loss. (a) Similarly to FedAVGmcmahan2017communication, interpolating between two atom versions obtained by clients with a shared initialization decreases the overall loss value. (b) We illustrate Theorem 3.1. empirically on Caltech-Office 10, showing that 's loss is locally quadratic.
Figure 3: In (a), we show the communication cost in % relative to the cost of communicating the parameters of the backbone. In (b), we show the hyper-parameter sensitivity of on the Office-Home benchmark.
Figure 4: t-SNE embeddings of distribution alignments of , KD3A and FADA. Blue points correspond to target domain points, whereas red points correspond to samples in the barycenter support () and source domains (KD3A and FADA).
Figure 5: Dataset distillation on Adaptiope benchmark. (a-c) show a comparison between real (blue) and reconstructed data points (green). (d-f) show the entropy of labels of reconstructed data points. For low values of , samples have higher label entropy. (g-i) show the distribution of label entropies, in line with the conclusion of (d-f). Finally, (j-l) compares the performance of distillation with samples generated by , in comparison to random sub-sampling the source (green) and target (yellow). For $=5$, one reaches state-of-the-art performance. This represents around 1.67% of the total amount of samples

Theorems & Definitions (4)

Definition 3.1
Theorem 3.1
Theorem B.1
proof

Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

TL;DR

Abstract

Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)