Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework
Xiaolong Li, Jianhao Wei, Haidong Wang, Li Dong, Ruoyang Chen, Changyan Yi, Jun Cai, Dusit Niyato, Xuemin, Shen
TL;DR
This work introduces SV-FDT, a surveillance video_assisted federated digital twin framework for intelligent transportation systems that incorporate pedestrians and vehicles in_the_loop. It deploys a cloud_edge_end architecture where end devices harvest surveillance videos, the edge performs semantic segmentation and twin_agent modeling to generate local DTs, and the cloud federates these into a real_time global DT while preserving privacy. The approach leverages semantic_segmentation, semantic_to_code transformation, and CARLA_based simulation to replicate complex pedestrian_vehicle interactions across regions, enabling applications such as adaptive traffic_signal control and emergency management. Case studies demonstrate improved mirroring_delay, recognition_accuracy, and user_experienced QoE compared with traditional terminal_server systems, highlighting SV-FDT’s potential for scalable, real_time ITS optimization in dynamic urban settings.
Abstract
In intelligent transportation systems (ITSs), incorporating pedestrians and vehicles in-the-loop is crucial for developing realistic and safe traffic management solutions. However, there is falls short of simulating complex real-world ITS scenarios, primarily due to the lack of a digital twin implementation framework for characterizing interactions between pedestrians and vehicles at different locations in different traffic environments. In this article, we propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop. Specifically, SVFDT builds comprehensive pedestrian-vehicle interaction models by leveraging multi-source traffic surveillance videos. Its architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime. We analyze key design requirements and challenges and present core guidelines for SVFDT's system implementation. A testbed evaluation demonstrates its effectiveness in optimizing traffic management. Comparisons with traditional terminal-server frameworks highlight SV-FDT's advantages in mirroring delays, recognition accuracy, and subjective evaluation. Finally, we identify some open challenges and discuss future research directions.
