Table of Contents
Fetching ...

Modality Alignment Meets Federated Broadcasting

Yuting Ma, Shengeng Tang, Xiaohua Xu, Lechao Cheng

TL;DR

A novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices is introduced, facilitating cross-client knowledge sharing and performance improvement under extreme heterogeneity.

Abstract

Federated learning (FL) has emerged as a powerful approach to safeguard data privacy by training models across distributed edge devices without centralizing local data. Despite advancements in homogeneous data scenarios, maintaining performance between the global and local clients in FL over heterogeneous data remains challenging due to data distribution variations that degrade model convergence and increase computational costs. This paper introduces a novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices. Inspired by multi-modal learning paradigms like CLIP, this design aligns cross-client learning by treating server-client communications akin to multi-modal broadcasting. We initialize with a pre-trained model to mitigate overfitting, updating select parameters through low-rank adaptation (LoRA) to meet computational demand and performance efficiency. Local models train independently and communicate updates to the server, which aggregates parameters via a query-based method, facilitating cross-client knowledge sharing and performance improvement under extreme heterogeneity. Extensive experiments on benchmark datasets demonstrate the efficacy in maintaining generalization and robustness, even in highly heterogeneous settings.

Modality Alignment Meets Federated Broadcasting

TL;DR

A novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices is introduced, facilitating cross-client knowledge sharing and performance improvement under extreme heterogeneity.

Abstract

Federated learning (FL) has emerged as a powerful approach to safeguard data privacy by training models across distributed edge devices without centralizing local data. Despite advancements in homogeneous data scenarios, maintaining performance between the global and local clients in FL over heterogeneous data remains challenging due to data distribution variations that degrade model convergence and increase computational costs. This paper introduces a novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices. Inspired by multi-modal learning paradigms like CLIP, this design aligns cross-client learning by treating server-client communications akin to multi-modal broadcasting. We initialize with a pre-trained model to mitigate overfitting, updating select parameters through low-rank adaptation (LoRA) to meet computational demand and performance efficiency. Local models train independently and communicate updates to the server, which aggregates parameters via a query-based method, facilitating cross-client knowledge sharing and performance improvement under extreme heterogeneity. Extensive experiments on benchmark datasets demonstrate the efficacy in maintaining generalization and robustness, even in highly heterogeneous settings.

Paper Structure

This paper contains 28 sections, 14 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: Left: Data distribution approximation based on Dirichlet distribution. Right: By leveraging a global text module on the server as a “constraint”, we can anchor image category learning across distributed nodes to achieve knowledge sharing.
  • Figure 2: (I) When $\alpha$ is set to a large value (e.g., $\alpha=100$), the probabilities focus mainly on near-uniform distributions. (II) When $\alpha = 1$, the imbalanced distribution uniformly exists in $K$ dimensional space. (III) When $\alpha$ approaches $0$, the probability over clients degenerates into a near one-hot partition.
  • Figure 3: Global accuracy (%) comparisons across different $\mu$, boundary layers, $r$, and the LoRA starting layers over the Dir data setting. The dashed line represents our method under the pure weighted aggregation strategy.
  • Figure 4: The framework of FedAlign.
  • Figure 5: The data partition among clients. The x-axis and y-axis represent the index of the category and client, respectively.
  • ...and 3 more figures