Table of Contents
Fetching ...

Transformer-based Federated Learning for Multi-Label Remote Sensing Image Classification

Barış Büyüktaş, Kenneth Weitzel, Sebastian Völkers, Felix Zailskas, Begüm Demir

TL;DR

This work tackles federated learning for multi-label remote sensing image classification under non-IID client data. It evaluates transformer-based architectures—$\text{MLP-Mixer}$, $\text{ConvMixer}$, and $\text{PoolFormer}$—against $\text{ResNet-50}$ using FedAvg and MOON on the BigEarthNet-S2 dataset, focusing on robustness to heterogeneity, local training cost, and aggregation cost. Findings indicate transformers can improve generalization under higher data heterogeneity, with PoolFormer offering a strong balance due to lower aggregation cost, while MOON provides larger gains for CNN baselines than for transformers. The results yield practical guidelines for selecting transformer architectures in RS FL and suggest that FedAvg with transformers can suffice in many non-IID scenarios, potentially extending to other remote sensing tasks.

Abstract

Federated learning (FL) aims to collaboratively learn deep learning model parameters from decentralized data archives (i.e., clients) without accessing training data on clients. However, the training data across clients might be not independent and identically distributed (non-IID), which may result in difficulty in achieving optimal model convergence. In this work, we investigate the capability of state-of-the-art transformer architectures (which are MLP-Mixer, ConvMixer, PoolFormer) to address the challenges related to non-IID training data across various clients in the context of FL for multi-label classification (MLC) problems in remote sensing (RS). The considered transformer architectures are compared among themselves and with the ResNet-50 architecture in terms of their: 1) robustness to training data heterogeneity; 2) local training complexity; and 3) aggregation complexity under different non-IID levels. The experimental results obtained on the BigEarthNet-S2 benchmark archive demonstrate that the considered architectures increase the generalization ability with the cost of higher local training and aggregation complexities. On the basis of our analysis, some guidelines are derived for a proper selection of transformer architecture in the context of FL for RS MLC. The code of this work is publicly available at https://git.tu-berlin.de/rsim/FL-Transformer.

Transformer-based Federated Learning for Multi-Label Remote Sensing Image Classification

TL;DR

This work tackles federated learning for multi-label remote sensing image classification under non-IID client data. It evaluates transformer-based architectures—, , and —against using FedAvg and MOON on the BigEarthNet-S2 dataset, focusing on robustness to heterogeneity, local training cost, and aggregation cost. Findings indicate transformers can improve generalization under higher data heterogeneity, with PoolFormer offering a strong balance due to lower aggregation cost, while MOON provides larger gains for CNN baselines than for transformers. The results yield practical guidelines for selecting transformer architectures in RS FL and suggest that FedAvg with transformers can suffice in many non-IID scenarios, potentially extending to other remote sensing tasks.

Abstract

Federated learning (FL) aims to collaboratively learn deep learning model parameters from decentralized data archives (i.e., clients) without accessing training data on clients. However, the training data across clients might be not independent and identically distributed (non-IID), which may result in difficulty in achieving optimal model convergence. In this work, we investigate the capability of state-of-the-art transformer architectures (which are MLP-Mixer, ConvMixer, PoolFormer) to address the challenges related to non-IID training data across various clients in the context of FL for multi-label classification (MLC) problems in remote sensing (RS). The considered transformer architectures are compared among themselves and with the ResNet-50 architecture in terms of their: 1) robustness to training data heterogeneity; 2) local training complexity; and 3) aggregation complexity under different non-IID levels. The experimental results obtained on the BigEarthNet-S2 benchmark archive demonstrate that the considered architectures increase the generalization ability with the cost of higher local training and aggregation complexities. On the basis of our analysis, some guidelines are derived for a proper selection of transformer architecture in the context of FL for RS MLC. The code of this work is publicly available at https://git.tu-berlin.de/rsim/FL-Transformer.
Paper Structure (11 sections, 4 equations, 3 tables)