Table of Contents
Fetching ...

FrameCorr: Adaptive, Autoencoder-based Neural Compression for Video Reconstruction in Resource and Timing Constrained Network Settings

John Li, Shehab Sarar Ahmed, Deepak Nair

TL;DR

The paper addresses reconstructing video frames when only partial data is available due to tight timing constraints in IoT-edge networks. It introduces FrameCorr, a neural approach that leverages inter-frame correlations to predict missing encoded content, building on Progressive Neural Compression (PNC) while comparing against AVC and an ABR setup. Experimental results on the UCF Sports Action dataset show that AVC excels with complete data, while PNC and FrameCorr enable partial-data reconstruction, with PNC frequently outperforming FrameCorr; FrameCorr’s limited gains are attributed to model simplicity and training alignment. The work highlights trade-offs between traditional compression, neural methods, and the potential of integrating predictive reconstruction with adaptive bitrate strategies for resilient edge streaming.

Abstract

Despite the growing adoption of video processing via Internet of Things (IoT) devices due to their cost-effectiveness, transmitting captured data to nearby servers poses challenges due to varying timing constraints and scarcity of network bandwidth. Existing video compression methods face difficulties in recovering compressed data when incomplete data is provided. Here, we introduce FrameCorr, a deep-learning based solution that utilizes previously received data to predict the missing segments of a frame, enabling the reconstruction of a frame from partially received data.

FrameCorr: Adaptive, Autoencoder-based Neural Compression for Video Reconstruction in Resource and Timing Constrained Network Settings

TL;DR

The paper addresses reconstructing video frames when only partial data is available due to tight timing constraints in IoT-edge networks. It introduces FrameCorr, a neural approach that leverages inter-frame correlations to predict missing encoded content, building on Progressive Neural Compression (PNC) while comparing against AVC and an ABR setup. Experimental results on the UCF Sports Action dataset show that AVC excels with complete data, while PNC and FrameCorr enable partial-data reconstruction, with PNC frequently outperforming FrameCorr; FrameCorr’s limited gains are attributed to model simplicity and training alignment. The work highlights trade-offs between traditional compression, neural methods, and the potential of integrating predictive reconstruction with adaptive bitrate strategies for resilient edge streaming.

Abstract

Despite the growing adoption of video processing via Internet of Things (IoT) devices due to their cost-effectiveness, transmitting captured data to nearby servers poses challenges due to varying timing constraints and scarcity of network bandwidth. Existing video compression methods face difficulties in recovering compressed data when incomplete data is provided. Here, we introduce FrameCorr, a deep-learning based solution that utilizes previously received data to predict the missing segments of a frame, enabling the reconstruction of a frame from partially received data.
Paper Structure (16 sections, 3 figures, 6 tables)

This paper contains 16 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The system model we consider for video transmission entails the following process: initially, a video capturing device captures frames and subsequently compresses them. These compressed frames are then transmitted via the wireless network to the edge server. At the server, decoding is applied to reconstruct the frames as accurately as possible to the originals.
  • Figure 2: Every frame $x_i$ undergoes compression to yield $c_i$, which is then transmitted across the network. Upon reception by the server, PNC zero-pads the received data to align with the dimensions of $c_i$. On the other hand, we develop a distinct deep-learning model, referred to as FrameCorr, to predict the encoded details of the current frame, denoted as $\tilde{c}_i$, based on the encoded information of the preceding $K$ frames. The absent segments of $c_i$ are populated with the corresponding portions of $\tilde{c}_i$, denoted as $\hat{c}_i$.
  • Figure 3: The average number of bytes per frame in the encoded information across the 18 videos in the test set