FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

Wei-Bin Kou; Qingfeng Lin; Ming Tang; Shuai Wang; Guangxu Zhu; Yik-Chung Wu

FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

Wei-Bin Kou, Qingfeng Lin, Ming Tang, Shuai Wang, Guangxu Zhu, Yik-Chung Wu

TL;DR

TriSU suffers from inter-city domain shifts that hinder generalization in distributed autonomous driving. The paper introduces FedRC, a rapid-converged Hierarchical Federated Learning framework that models both per-image pixel distributions and per-dataset distributions as Gaussians, using Bhattacharyya distance to compute data-aware aggregation weights. This distribution-centric weighting accelerates convergence and improves segmentation performance on Cityscapes and CamVid, with qualitative validation in CARLA, marking a first integration of Gaussian-based aggregation in HFL for TriSU. The work demonstrates that data-aware, statistically grounded aggregation can significantly enhance the robustness and efficiency of distributed semantic understanding systems for autonomous driving, with potential extension to other AD tasks and multi-modal data.

Abstract

Street Scene Semantic Understanding (denoted as TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because of vehicles' surrounding heterogeneity across cities. Going beyond existing HFL works that have deficient capabilities in complex tasks, we propose a rapid-converged heterogeneous HFL framework (FedRC) to address the inter-city data heterogeneity and accelerate HFL model convergence rate. In our proposed FedRC framework, both single RGB image and RGB dataset are modelled as Gaussian distributions in HFL aggregation weight design. This approach not only differentiates each RGB sample instead of typically equalizing them, but also considers both data volume and statistical properties rather than simply taking data quantity into consideration. Extensive experiments on the TriSU task using across-city datasets demonstrate that FedRC converges faster than the state-of-the-art benchmark by 38.7%, 37.5%, 35.5%, and 40.6% in terms of mIoU, mPrecision, mRecall, and mF1, respectively. Furthermore, qualitative evaluations in the CARLA simulation environment confirm that the proposed FedRC framework delivers top-tier performance.

FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

TL;DR

Abstract

Paper Structure (27 sections, 18 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 27 sections, 18 equations, 6 figures, 6 tables, 2 algorithms.

INTRODUCTION
Related Work
Federated Learning (FL)
Street Scene Semantic Understanding (TriSU)
Methodology
HFL Formulation
Vehicle Training
Edge Aggregation
Cloud Aggregation
FedRC Framework
Step I: Distribution Estimation of Single RGB Image
Step II: RGB Dataset Distribution Estimation of Vehicles, Edge Servers and Cloud Server
Step III: Distance between Local and Global Dataset
Step IV: FedRC Weights Calculation
Complexity Analysis
...and 12 more sections

Figures (6)

Figure 1: The illustration of Hierarchical Federated Learning (HFL) on TriSU task. M is the set of participating cities.
Figure 2: The illustration of estimated RGB image Gaussians and RGB dataset Gaussian. $n, \mu, \delta^2$ represent the dataset size, mean and variance of dataset Gaussian distribution, respectively.
Figure 3: This figure illustrates the normalized histogram and probability density function (PDF) of two RGB samples. For example, with respect to "RGB Sample #1", the estimated mean and variance of Gaussian distribution are 121.97 and 55.54, respectively.
Figure 4: FedRC result. The legend $'Client\{1,1\}-578-0.53-0.41'$ in \ref{['Fig.city_edge1']} can be separated into four parts by $'-'$. They represent vehicle ID, dataset size, $proportion$-based weight and FedRC weight, respectively. The legend $'Edge1-1081'$ in \ref{['Fig.city_edge1']} means Edge $1$ has virtual dataset with 1081 size. The legends in \ref{['Fig.city_edge2', 'Fig.city_edge3', 'Fig.city_cloud']} share the similar meaning with \ref{['Fig.city_edge1']}. It is observed that FedRC weights are better than $proportion$-based weight for aggregation. For example, in the \ref{['Fig.city_cloud']}, the Edge 2 distribution is far away from the Cloud distribution, it should have a smaller weight for model aggregation, which FedRC weight fits whereas $proportion$-based weight does not.
Figure 5: Convergence comparison. Results show that FedRC converges faster than all other FL algorithms across all metrics.
...and 1 more figures

FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

TL;DR

Abstract

FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

Authors

TL;DR

Abstract

Table of Contents

Figures (6)