Table of Contents
Fetching ...

Bandwidth-adaptive Cloud-Assisted 360-Degree 3D Perception for Autonomous Vehicles

Faisal Hawladera, Rui Meireles, Gamal Elghazaly, Ana Aguiar, Raphaël Frank

TL;DR

This approach utilizes transformer-based models to fuse multi-camera sensor data into a comprehensive Bird's-Eye View (BEV) representation, enabling accurate 360-degree 3D object detection and reduces overall latency through leveraging Vehicle-to-Everything (V2X) communication.

Abstract

A key challenge for autonomous driving lies in maintaining real-time situational awareness regarding surrounding obstacles under strict latency constraints. The high processing requirements coupled with limited onboard computational resources can cause delay issues, particularly in complex urban settings. To address this, we propose leveraging Vehicle-to-Everything (V2X) communication to partially offload processing to the cloud, where compute resources are abundant, thus reducing overall latency. Our approach utilizes transformer-based models to fuse multi-camera sensor data into a comprehensive Bird's-Eye View (BEV) representation, enabling accurate 360-degree 3D object detection. The computation is dynamically split between the vehicle and the cloud based on the number of layers processed locally and the quantization level of the features. To further reduce network load, we apply feature vector clipping and compression prior to transmission. In a real-world experimental evaluation, our hybrid strategy achieved a 72 \% reduction in end-to-end latency compared to a traditional onboard solution. To adapt to fluctuating network conditions, we introduce a dynamic optimization algorithm that selects the split point and quantization level to maximize detection accuracy while satisfying real-time latency constraints. Trace-based evaluation under realistic bandwidth variability shows that this adaptive approach improves accuracy by up to 20 \% over static parameterization with the same latency performance.

Bandwidth-adaptive Cloud-Assisted 360-Degree 3D Perception for Autonomous Vehicles

TL;DR

This approach utilizes transformer-based models to fuse multi-camera sensor data into a comprehensive Bird's-Eye View (BEV) representation, enabling accurate 360-degree 3D object detection and reduces overall latency through leveraging Vehicle-to-Everything (V2X) communication.

Abstract

A key challenge for autonomous driving lies in maintaining real-time situational awareness regarding surrounding obstacles under strict latency constraints. The high processing requirements coupled with limited onboard computational resources can cause delay issues, particularly in complex urban settings. To address this, we propose leveraging Vehicle-to-Everything (V2X) communication to partially offload processing to the cloud, where compute resources are abundant, thus reducing overall latency. Our approach utilizes transformer-based models to fuse multi-camera sensor data into a comprehensive Bird's-Eye View (BEV) representation, enabling accurate 360-degree 3D object detection. The computation is dynamically split between the vehicle and the cloud based on the number of layers processed locally and the quantization level of the features. To further reduce network load, we apply feature vector clipping and compression prior to transmission. In a real-world experimental evaluation, our hybrid strategy achieved a 72 \% reduction in end-to-end latency compared to a traditional onboard solution. To adapt to fluctuating network conditions, we introduce a dynamic optimization algorithm that selects the split point and quantization level to maximize detection accuracy while satisfying real-time latency constraints. Trace-based evaluation under realistic bandwidth variability shows that this adaptive approach improves accuracy by up to 20 \% over static parameterization with the same latency performance.
Paper Structure (18 sections, 1 theorem, 4 equations, 11 figures, 4 tables, 5 algorithms)

This paper contains 18 sections, 1 theorem, 4 equations, 11 figures, 4 tables, 5 algorithms.

Key Result

Theorem 1

$optPar()$ returns the highest nds-yielding parameter tuple $(split, q)$ that satisfies the latency bound $lat_{total} \leq lat_{max}$ or, if no such tuple exists, the tuple that minimizes $lat_{total}$.

Figures (11)

  • Figure 1: In the Onboard Computing scenario, the BEVFormer model runs locally, transmitting detection results as CPMs over ITS-G5. In the Hybrid Computing scenario, a compressed feature vector is sent via C-V2X to the cloud for intensive processing, with detection results broadcast to nearby vehicles.
  • Figure 2: A basic overview of different containers included in the CPM message format as defined by the ETSI standard ts2023103.
  • Figure 3: CPM transmission latency versus distance between a moving vehicle (25kmh avg.) and a stationary receiver at fixed coordinates (longitude: 6.161993, latitude: 49.626478). Dashed line shows average latency 4.10ms.
  • Figure 4: Feature vector size and extraction time versus split depth. Solid lines represent feature extraction time (left y-axis), while dashed lines indicate feature size (right y-axis). Lower-precision quantization (e.g., FP16, FP8) reduces both extraction time and feature size.
  • Figure 5: Transmission latency of feature vectors from vehicle to cloud across five split layers for FP32, FP16, and FP8 over a 5G network using C-V2X. FP8 demonstrates the lowest and most stable latency, suitable for real-time transmission. FP32 exhibits the highest latency and variability, especially at earlier split layers due to larger feature size.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Theorem 1