Table of Contents
Fetching ...

A Multi-Modal Federated Learning Framework for Remote Sensing Image Classification

Barış Büyüktaş, Gencer Sumbul, Begüm Demir

TL;DR

This paper tackles privacy-aware remote sensing classification when data reside in decentralized, modality-diverse clients. It introduces a multi-modal federated learning framework with three modules—multi-modal fusion (MF), feature whitening (FW), and mutual information maximization (MIM)—to learn from unshared data without direct data sharing. Empirical results on BigEarthNet-MM and Dynamic World-Expert show the framework achieving superior performance over state-of-the-art FL methods across various decentralization scenarios, with ablations confirming the complementary value of MF, FW, and MIM. The approach preserves data privacy, accommodates heterogeneous modalities, and offers a flexible integration with modality-specific backbones, advancing practical MM-FL for RS image classification.

Abstract

Federated learning (FL) enables the collaborative training of deep neural networks across decentralized data archives (i.e., clients) without sharing the local data of the clients. Most of the existing FL methods assume that the data distributed across all clients is associated with the same data modality. However, remote sensing (RS) images present in different clients can be associated with diverse data modalities. The joint use of the multi-modal RS data can significantly enhance classification performance. To effectively exploit decentralized and unshared multi-modal RS data, our paper introduces a novel multi-modal FL framework for RS image classification problems. The proposed framework comprises three modules: 1) multi-modal fusion (MF); 2) feature whitening (FW); and 3) mutual information maximization (MIM). The MF module employs iterative model averaging to facilitate learning without accessing multi-modal training data on clients. The FW module aims to address the limitations of training data heterogeneity by aligning data distributions across clients. The MIM module aims to model mutual information by maximizing the similarity between images from different modalities. For the experimental analyses, we focus our attention on multi-label classification and pixel-based classification tasks in RS. The results obtained using two benchmark archives show the effectiveness of the proposed framework when compared to state-of-the-art algorithms in the literature. The code of the proposed framework will be available at https://git.tu-berlin.de/rsim/multi-modal-FL.

A Multi-Modal Federated Learning Framework for Remote Sensing Image Classification

TL;DR

This paper tackles privacy-aware remote sensing classification when data reside in decentralized, modality-diverse clients. It introduces a multi-modal federated learning framework with three modules—multi-modal fusion (MF), feature whitening (FW), and mutual information maximization (MIM)—to learn from unshared data without direct data sharing. Empirical results on BigEarthNet-MM and Dynamic World-Expert show the framework achieving superior performance over state-of-the-art FL methods across various decentralization scenarios, with ablations confirming the complementary value of MF, FW, and MIM. The approach preserves data privacy, accommodates heterogeneous modalities, and offers a flexible integration with modality-specific backbones, advancing practical MM-FL for RS image classification.

Abstract

Federated learning (FL) enables the collaborative training of deep neural networks across decentralized data archives (i.e., clients) without sharing the local data of the clients. Most of the existing FL methods assume that the data distributed across all clients is associated with the same data modality. However, remote sensing (RS) images present in different clients can be associated with diverse data modalities. The joint use of the multi-modal RS data can significantly enhance classification performance. To effectively exploit decentralized and unshared multi-modal RS data, our paper introduces a novel multi-modal FL framework for RS image classification problems. The proposed framework comprises three modules: 1) multi-modal fusion (MF); 2) feature whitening (FW); and 3) mutual information maximization (MIM). The MF module employs iterative model averaging to facilitate learning without accessing multi-modal training data on clients. The FW module aims to address the limitations of training data heterogeneity by aligning data distributions across clients. The MIM module aims to model mutual information by maximizing the similarity between images from different modalities. For the experimental analyses, we focus our attention on multi-label classification and pixel-based classification tasks in RS. The results obtained using two benchmark archives show the effectiveness of the proposed framework when compared to state-of-the-art algorithms in the literature. The code of the proposed framework will be available at https://git.tu-berlin.de/rsim/multi-modal-FL.

Paper Structure

This paper contains 21 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: An illustration of our framework. For the sake of simplicity, it is assumed that the images distributed across clients are associated with two different data modalities.
  • Figure 2: $F_1$ score versus communication round obtained by the proposed framework when the different numbers $K$ of clients are considered under DS1-BEN.
  • Figure 3: An example of BigEarthNet-MM image pairs with the true multi-labels and the multi-labels assigned by the FedAvg, SCAFFOLD, MOON, FedDC, and the proposed framework.