Table of Contents
Fetching ...

Radiance Field Delta Video Compression in Edge-Enabled Vehicular Metaverse

Matúš Dopiriak, Eugen Šlapak, Juraj Gazda, Devendra Singh Gurjar, Mohammad Abdullah Al Faruque, Marco Levorato

TL;DR

This work tackles the challenge of physical-to-virtual synchronization in the vehicular metaverse by introducing Radiance Field Delta Video Compression (RFDVC), which leverages distributed RFs as digital twins to store photorealistic 3D urban scenes and transmit only the delta to actual camera frames. By formulating an optimization over Delta-frames and employing delta segmentation that combines semantic masks (e.g., YOLOv11) with fast segmentation (FastSAM), the method achieves substantial data savings while maintaining downstream task fidelity. Empirical results in CARLA-based urban scenarios show data savings up to 71% against H.264 and 44% against H.265, along with improved SSIM under moderate packet loss, demonstrating robustness to URLLC-like network conditions. The approach combines RF-based 3D scene representations (NeRF variants INGP and 3DGS), semantic-aware delta encoding, and edge computing to enable scalable, photorealistic digital twins at the network edge with reduced bandwidth and latency.

Abstract

Connected and autonomous vehicles (CAVs) offload computationally intensive tasks to multi-access edge computing (MEC) servers via vehicle-to-infrastructure (V2I) communication, enabling applications within the vehicular metaverse, which transforms physical environment into the digital space enabling advanced analysis or predictive modeling. A core challenge is physical-to-virtual (P2V) synchronization through digital twins (DTs), reliant on MEC networks and ultra-reliable low-latency communication (URLLC). To address this, we introduce radiance field (RF) delta video compression (RFDVC), which uses RF-encoder and RF-decoder architecture using distributed RFs as DTs storing photorealistic 3D urban scenes in compressed form. This method extracts differences between CAV-frame capturing actual traffic and RF-frame capturing empty scene from the same camera pose in batches encoded and transmitted over the MEC network. Experiments show data savings up to 71% against H.264 codec and 44% against H.265 codec under different conditions as lighting changes, and rain. RFDVC also demonstrates resilience to transmission errors, significantly outperforming the standard codec in non-rainy conditions with up to a +0.26 structural similarity index measure (SSIM) improvement over H.264 codec, and maintaining a +0.18 SSIM improvement even in challenging rainy conditions, both measured at a block error rate (BLER) of 0.25.

Radiance Field Delta Video Compression in Edge-Enabled Vehicular Metaverse

TL;DR

This work tackles the challenge of physical-to-virtual synchronization in the vehicular metaverse by introducing Radiance Field Delta Video Compression (RFDVC), which leverages distributed RFs as digital twins to store photorealistic 3D urban scenes and transmit only the delta to actual camera frames. By formulating an optimization over Delta-frames and employing delta segmentation that combines semantic masks (e.g., YOLOv11) with fast segmentation (FastSAM), the method achieves substantial data savings while maintaining downstream task fidelity. Empirical results in CARLA-based urban scenarios show data savings up to 71% against H.264 and 44% against H.265, along with improved SSIM under moderate packet loss, demonstrating robustness to URLLC-like network conditions. The approach combines RF-based 3D scene representations (NeRF variants INGP and 3DGS), semantic-aware delta encoding, and edge computing to enable scalable, photorealistic digital twins at the network edge with reduced bandwidth and latency.

Abstract

Connected and autonomous vehicles (CAVs) offload computationally intensive tasks to multi-access edge computing (MEC) servers via vehicle-to-infrastructure (V2I) communication, enabling applications within the vehicular metaverse, which transforms physical environment into the digital space enabling advanced analysis or predictive modeling. A core challenge is physical-to-virtual (P2V) synchronization through digital twins (DTs), reliant on MEC networks and ultra-reliable low-latency communication (URLLC). To address this, we introduce radiance field (RF) delta video compression (RFDVC), which uses RF-encoder and RF-decoder architecture using distributed RFs as DTs storing photorealistic 3D urban scenes in compressed form. This method extracts differences between CAV-frame capturing actual traffic and RF-frame capturing empty scene from the same camera pose in batches encoded and transmitted over the MEC network. Experiments show data savings up to 71% against H.264 codec and 44% against H.265 codec under different conditions as lighting changes, and rain. RFDVC also demonstrates resilience to transmission errors, significantly outperforming the standard codec in non-rainy conditions with up to a +0.26 structural similarity index measure (SSIM) improvement over H.264 codec, and maintaining a +0.18 SSIM improvement even in challenging rainy conditions, both measured at a block error rate (BLER) of 0.25.

Paper Structure

This paper contains 19 sections, 27 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1:
  • Figure 2: (a) The RF-encoder extracts pairs of input CAV-frames and RF-frames from the same camera poses to segment Delta-frames using the DS algorithm. Delta-frames are then encoded via the codec (H.264 or H.265) and transmitted over the channel. (b) The RF-decoder subsequently decodes the delta and combines Delta-frames with RF-frames to reconstruct Rec-frames.
  • Figure 3: Differences between the CAV-frame and RF-frame are segmented using FastSAM for its zero-shot generalization and IoU metrics, while YOLOv11 ensures segmentation of critical classes (e.g., pedestrians and vehicles). The Delta-frame overlays the YOLO masks onto the FastSAM masks, with irrelevant pixels represented in black.
  • Figure 4: Examples of RF-frames extracted from 3DGS and INGP RF models trained in morning conditions with no vehicles or pedestrians present, followed by CAV-frames from sparse traffic in noon and dense traffic in noon, evening, wet conditions, and rain.
  • Figure 5: CARLA map divided into 19 areas, each represented as RF model. Highways are depicted in red and intersections in green.
  • ...and 6 more figures