Table of Contents
Fetching ...

A Mutual Information Perspective on Federated Contrastive Learning

Christos Louizos, Matthias Reisser, Denis Korzhenkov

TL;DR

This work studies federated contrastive learning through a multi-view MI lens, showing that naive local SimCLR optimization bounds the client-specific MI $I_\theta({\mathbf{z}}_1; {\mathbf{z}}_2|s)$, while introducing a user-verification loss yields a tractable lower bound to the global MI $I_\theta({\mathbf{z}}_1; {\mathbf{z}}_2)$. By coupling this UV term with optional label-based objectives, the authors extend SimCLR to federated semi-supervised settings, deriving decomposable bounds suitable for FedAvg. Empirical results on CIFAR-10/100 and TinyImagenet reveal that UV-based global MI optimization provides consistent gains under label-skew non-i.i.d.-ness and generalizes to spectral CL and SimSiam, though its efficacy depends on the source of non-i.i.d.-ness. Overall, the paper furnishes a principled MI-based framework for federated pretraining and offers guidance on when global MI objectives improve downstream tasks.

Abstract

We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.

A Mutual Information Perspective on Federated Contrastive Learning

TL;DR

This work studies federated contrastive learning through a multi-view MI lens, showing that naive local SimCLR optimization bounds the client-specific MI , while introducing a user-verification loss yields a tractable lower bound to the global MI . By coupling this UV term with optional label-based objectives, the authors extend SimCLR to federated semi-supervised settings, deriving decomposable bounds suitable for FedAvg. Empirical results on CIFAR-10/100 and TinyImagenet reveal that UV-based global MI optimization provides consistent gains under label-skew non-i.i.d.-ness and generalizes to spectral CL and SimSiam, though its efficacy depends on the source of non-i.i.d.-ness. Overall, the paper furnishes a principled MI-based framework for federated pretraining and offers guidance on when global MI objectives improve downstream tasks.

Abstract

We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.
Paper Structure (30 sections, 8 theorems, 31 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 30 sections, 8 theorems, 31 equations, 3 figures, 7 tables, 2 algorithms.

Key Result

Proposition 1

Let $s \in \mathbb{N}$ denote the user ID, ${\mathbf{x}} \in \mathbb{R}^{D_x}$ the input and ${\mathbf{z}}_1, {\mathbf{z}}_2 \in \mathbb{R}^{D_z}$ the latent representations of the two views of ${\mathbf{x}}$ given by the encoder with parameters $\theta$. Given a critic function $f: \mathbb{R}^{D_z}

Figures (3)

  • Figure 1: Graphical model of the assumed generative process under the various sources of non-i.i.d.-ness: label-skew, covariate shift and joint shift.
  • Figure 2: Overview of the SimCLR architectures considered. Local SimCLR (left): each client optimizes a contrastive loss on their own data, thus the federation implicitly optimizes a lower bound to $\mathrm{I}({\mathbf{z}}_1; {\mathbf{z}}_2| s)$. Federated SimCLR (center): along with the contrastive loss on their own data, each client also optimizes a client classifier, thus the federation implicitly optimizes a lower bound to $\mathrm{I}({\mathbf{z}}_1; {\mathbf{z}}_2)$. Supervised federated SimCLR (right): a label-dependent variant of federated SimCLR that encourages clustering according to the label while also optimizing a lower bound to $\mathrm{I}({\mathbf{z}}_1; {\mathbf{z}}_2)$.
  • Figure 3: CIFAR 10 ablation studies. (a) Performance of local and federated SimCLR as a function of the non-i.i.d.-ness strength $\alpha$ for covariate shift and label skew. (b) Performance of local and federated SimCLR for different amount of local epochs $E$ in the case of strong ($\alpha=0.1$) covariate shift and label skew. (c) Performance of local and federated SimCLR in the semi-supervised setting as a function of the amount of available labelled data.

Theorems & Definitions (12)

  • Proposition 1
  • Lemma 2.1
  • Lemma 2.2
  • Proposition 2
  • Proposition 1
  • proof
  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • ...and 2 more