Table of Contents
Fetching ...

FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients

DaiXun Li, Weiying Xie, ZiXuan Wang, YiBing Lu, Yunsong Li, Leyuan Fang

TL;DR

FedDiff addresses the need for privacy-preserving, distributed fusion of heterogeneous remote-sensing data by introducing a diffusion-model driven federated framework with dual-branch HSI and LiDAR diffusion networks. The approach couples a lightweight, frequency-aware fusion mechanism and a low-rank feature decomposition module to reduce communication while maintaining high classification accuracy. Key technical contributions include the multi-modal federated interaction module, the frequency-domain enhancement within a U-Net backbone, and a loss that synergistically combines cross-entropy and MSE with weight regularization. Experiments on Houston2013, Trento, and MUUFL demonstrate superior performance and significantly reduced communication cost compared with prior methods, highlighting FedDiff’s practical potential for distributed RS tasks.

Abstract

With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the diffusion process is driven by a single modal in a single computing node. To facilitate the secure fusion of heterogeneous data from clients, it is necessary to enable distributed multi-modal control, such as merging the hyperspectral data of organization A and the LiDAR data of organization B privately on each base station client. In this study, we propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Our key insight is that diffusion models driven by different modalities are inherently complementary in terms of potential denoising steps on which bilateral connections can be built. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure, and introduce a lightweight communication module. Qualitative and quantitative experiments validate the superiority of our framework in terms of image quality and conditional consistency.

FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients

TL;DR

FedDiff addresses the need for privacy-preserving, distributed fusion of heterogeneous remote-sensing data by introducing a diffusion-model driven federated framework with dual-branch HSI and LiDAR diffusion networks. The approach couples a lightweight, frequency-aware fusion mechanism and a low-rank feature decomposition module to reduce communication while maintaining high classification accuracy. Key technical contributions include the multi-modal federated interaction module, the frequency-domain enhancement within a U-Net backbone, and a loss that synergistically combines cross-entropy and MSE with weight regularization. Experiments on Houston2013, Trento, and MUUFL demonstrate superior performance and significantly reduced communication cost compared with prior methods, highlighting FedDiff’s practical potential for distributed RS tasks.

Abstract

With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the diffusion process is driven by a single modal in a single computing node. To facilitate the secure fusion of heterogeneous data from clients, it is necessary to enable distributed multi-modal control, such as merging the hyperspectral data of organization A and the LiDAR data of organization B privately on each base station client. In this study, we propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Our key insight is that diffusion models driven by different modalities are inherently complementary in terms of potential denoising steps on which bilateral connections can be built. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure, and introduce a lightweight communication module. Qualitative and quantitative experiments validate the superiority of our framework in terms of image quality and conditional consistency.
Paper Structure (27 sections, 29 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 27 sections, 29 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: FedDiff framework at multi-clients and multi-modal network. It includes HSI clients and LiDAR clients. There is no need to interact with original data between clients, and the global model is updated by transmitting intermediate features to achieve multi-modal data fusion.
  • Figure 2: Overview of the proposed FedDiff framework. The framework includes (1) Local Diffusion Module, (2) Global Data Fusion, and a classifier. The architecture will be optimized with respect to the mean squared error (MSE) loss of the classifier and task-specific loss for scene classification. In the global data fusion phase, the interactive information is propagated to each client through federated learning, in which the multi-modal federated learning communication architecture is used to reduce communication cost.
  • Figure 3: Visualization of classification results of individual categories, specifically include categories 12 (Park lot 1), 14 (Tennis court), and 15 (Running track) on the Houston2013 dataset.
  • Figure 4: Visualization of false-color HSI and LiDAR images using different comparison methods based on the Houston2013 dataset.
  • Figure 5: Visualization of false-color HSI and LiDAR images using different comparison methods based on the Trento dataset.
  • ...and 1 more figures