FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding
Wei-Bin Kou, Qingfeng Lin, Ming Tang, Shuai Wang, Guangxu Zhu, Yik-Chung Wu
TL;DR
TriSU suffers from inter-city domain shifts that hinder generalization in distributed autonomous driving. The paper introduces FedRC, a rapid-converged Hierarchical Federated Learning framework that models both per-image pixel distributions and per-dataset distributions as Gaussians, using Bhattacharyya distance to compute data-aware aggregation weights. This distribution-centric weighting accelerates convergence and improves segmentation performance on Cityscapes and CamVid, with qualitative validation in CARLA, marking a first integration of Gaussian-based aggregation in HFL for TriSU. The work demonstrates that data-aware, statistically grounded aggregation can significantly enhance the robustness and efficiency of distributed semantic understanding systems for autonomous driving, with potential extension to other AD tasks and multi-modal data.
Abstract
Street Scene Semantic Understanding (denoted as TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because of vehicles' surrounding heterogeneity across cities. Going beyond existing HFL works that have deficient capabilities in complex tasks, we propose a rapid-converged heterogeneous HFL framework (FedRC) to address the inter-city data heterogeneity and accelerate HFL model convergence rate. In our proposed FedRC framework, both single RGB image and RGB dataset are modelled as Gaussian distributions in HFL aggregation weight design. This approach not only differentiates each RGB sample instead of typically equalizing them, but also considers both data volume and statistical properties rather than simply taking data quantity into consideration. Extensive experiments on the TriSU task using across-city datasets demonstrate that FedRC converges faster than the state-of-the-art benchmark by 38.7%, 37.5%, 35.5%, and 40.6% in terms of mIoU, mPrecision, mRecall, and mF1, respectively. Furthermore, qualitative evaluations in the CARLA simulation environment confirm that the proposed FedRC framework delivers top-tier performance.
