Table of Contents
Fetching ...

Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework

Xiaolong Li, Jianhao Wei, Haidong Wang, Li Dong, Ruoyang Chen, Changyan Yi, Jun Cai, Dusit Niyato, Xuemin, Shen

TL;DR

This work introduces SV-FDT, a surveillance video_assisted federated digital twin framework for intelligent transportation systems that incorporate pedestrians and vehicles in_the_loop. It deploys a cloud_edge_end architecture where end devices harvest surveillance videos, the edge performs semantic segmentation and twin_agent modeling to generate local DTs, and the cloud federates these into a real_time global DT while preserving privacy. The approach leverages semantic_segmentation, semantic_to_code transformation, and CARLA_based simulation to replicate complex pedestrian_vehicle interactions across regions, enabling applications such as adaptive traffic_signal control and emergency management. Case studies demonstrate improved mirroring_delay, recognition_accuracy, and user_experienced QoE compared with traditional terminal_server systems, highlighting SV-FDT’s potential for scalable, real_time ITS optimization in dynamic urban settings.

Abstract

In intelligent transportation systems (ITSs), incorporating pedestrians and vehicles in-the-loop is crucial for developing realistic and safe traffic management solutions. However, there is falls short of simulating complex real-world ITS scenarios, primarily due to the lack of a digital twin implementation framework for characterizing interactions between pedestrians and vehicles at different locations in different traffic environments. In this article, we propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop. Specifically, SVFDT builds comprehensive pedestrian-vehicle interaction models by leveraging multi-source traffic surveillance videos. Its architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime. We analyze key design requirements and challenges and present core guidelines for SVFDT's system implementation. A testbed evaluation demonstrates its effectiveness in optimizing traffic management. Comparisons with traditional terminal-server frameworks highlight SV-FDT's advantages in mirroring delays, recognition accuracy, and subjective evaluation. Finally, we identify some open challenges and discuss future research directions.

Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework

TL;DR

This work introduces SV-FDT, a surveillance video_assisted federated digital twin framework for intelligent transportation systems that incorporate pedestrians and vehicles in_the_loop. It deploys a cloud_edge_end architecture where end devices harvest surveillance videos, the edge performs semantic segmentation and twin_agent modeling to generate local DTs, and the cloud federates these into a real_time global DT while preserving privacy. The approach leverages semantic_segmentation, semantic_to_code transformation, and CARLA_based simulation to replicate complex pedestrian_vehicle interactions across regions, enabling applications such as adaptive traffic_signal control and emergency management. Case studies demonstrate improved mirroring_delay, recognition_accuracy, and user_experienced QoE compared with traditional terminal_server systems, highlighting SV-FDT’s potential for scalable, real_time ITS optimization in dynamic urban settings.

Abstract

In intelligent transportation systems (ITSs), incorporating pedestrians and vehicles in-the-loop is crucial for developing realistic and safe traffic management solutions. However, there is falls short of simulating complex real-world ITS scenarios, primarily due to the lack of a digital twin implementation framework for characterizing interactions between pedestrians and vehicles at different locations in different traffic environments. In this article, we propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop. Specifically, SVFDT builds comprehensive pedestrian-vehicle interaction models by leveraging multi-source traffic surveillance videos. Its architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime. We analyze key design requirements and challenges and present core guidelines for SVFDT's system implementation. A testbed evaluation demonstrates its effectiveness in optimizing traffic management. Comparisons with traditional terminal-server frameworks highlight SV-FDT's advantages in mirroring delays, recognition accuracy, and subjective evaluation. Finally, we identify some open challenges and discuss future research directions.

Paper Structure

This paper contains 14 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: The overall architecture of the proposed SV-FDT system. SV-FDT consists of three layers: the end layer contains multiple terminal devices; the edge layer handles tasks for data preprocessing, semantic segmentation, and local DT construction; and the cloud layer constructs the global model and conducts model inferences for ITS applications.
  • Figure 2: Systematic framework of the proposed SV-FDT. The semantic segmentation algorithm generates semantic data for vehicles: $<$vehicle, id="A", speed=2m/s$>$, $<$vehicle, id="B", speed=3m/s$>$. The semantic-to-code transformation module converts the semantic data into the following traffic codes: vehicle1.setID("A"), vehicle1.setSpeed(2), vehicle2.setID("B"), vehicle2.setSpeed(3). Local and global DT models then transmit and execute these codes.
  • Figure 3: The SV-FDT framework-based FDT testbed platform for optimizing traffic signal timing.
  • Figure 4: Optimized traffic signal time settings based on road width, pedestrian walking speed, and volume.
  • Figure 5: Comparisons on objective measurements.