Table of Contents
Fetching ...

Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation

Mingxuan Yan, Yi Wang, Xuedou Xiao, Zhiqing Luo, Jianhua He, Wei Wang

TL;DR

The paper addresses the challenge of performing pixel-level video semantic segmentation on resource-constrained IoT devices via edge-assisted inference, where dynamic video content causes fluctuating model accuracy and bitrate. It proposes Penance, a lightweight framework that jointly optimizes edge model selection and video compression settings by leveraging VSS softmax outputs, H.264/AVC bitrate predictions, and a DRL-based CRL Adapter, all runnable on CPU-only devices. The key contributions are (i) a content-aware bitrate estimator aligned with codec prediction mechanics, (ii) a performance encoder that captures runtime VSS dynamics from softmax outputs, and (iii) a constrained reinforcement learning policy that minimizes edge cost while satisfying bandwidth and accuracy constraints, validated against baselines on CPU devices with strong cost savings and low failure rates. The approach demonstrates practical impact by enabling scalable, low-cost edge VSS deployments suitable for IoT environments, with demonstrated robustness to content dynamics and generalization across encoding configurations.

Abstract

Offloading computing to edge servers is a promising solution to support growing video understanding applications at resource-constrained IoT devices. Recent efforts have been made to enhance the scalability of such systems by reducing inference costs on edge servers. However, existing research is not directly applicable to pixel-level vision tasks such as video semantic segmentation (VSS), partly due to the fluctuating VSS accuracy and segment bitrate caused by the dynamic video content. In response, we present Penance, a new edge inference cost reduction framework. By exploiting softmax outputs of VSS models and the prediction mechanism of H.264/AVC codecs, Penance optimizes model selection and compression settings to minimize the inference cost while meeting the required accuracy within the available bandwidth constraints. We implement Penance in a commercial IoT device with only CPUs. Experimental results show that Penance consumes a negligible 6.8% more computation resources than the optimal strategy while satisfying accuracy and bandwidth constraints with a low failure rate.

Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation

TL;DR

The paper addresses the challenge of performing pixel-level video semantic segmentation on resource-constrained IoT devices via edge-assisted inference, where dynamic video content causes fluctuating model accuracy and bitrate. It proposes Penance, a lightweight framework that jointly optimizes edge model selection and video compression settings by leveraging VSS softmax outputs, H.264/AVC bitrate predictions, and a DRL-based CRL Adapter, all runnable on CPU-only devices. The key contributions are (i) a content-aware bitrate estimator aligned with codec prediction mechanics, (ii) a performance encoder that captures runtime VSS dynamics from softmax outputs, and (iii) a constrained reinforcement learning policy that minimizes edge cost while satisfying bandwidth and accuracy constraints, validated against baselines on CPU devices with strong cost savings and low failure rates. The approach demonstrates practical impact by enabling scalable, low-cost edge VSS deployments suitable for IoT environments, with demonstrated robustness to content dynamics and generalization across encoding configurations.

Abstract

Offloading computing to edge servers is a promising solution to support growing video understanding applications at resource-constrained IoT devices. Recent efforts have been made to enhance the scalability of such systems by reducing inference costs on edge servers. However, existing research is not directly applicable to pixel-level vision tasks such as video semantic segmentation (VSS), partly due to the fluctuating VSS accuracy and segment bitrate caused by the dynamic video content. In response, we present Penance, a new edge inference cost reduction framework. By exploiting softmax outputs of VSS models and the prediction mechanism of H.264/AVC codecs, Penance optimizes model selection and compression settings to minimize the inference cost while meeting the required accuracy within the available bandwidth constraints. We implement Penance in a commercial IoT device with only CPUs. Experimental results show that Penance consumes a negligible 6.8% more computation resources than the optimal strategy while satisfying accuracy and bandwidth constraints with a low failure rate.
Paper Structure (34 sections, 12 equations, 15 figures, 2 tables)

This paper contains 34 sections, 12 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Showcase of the VSS accuracy fluctuation
  • Figure 2: Bandwidth usage distributions
  • Figure 3: MAE of the reprofiled accuracy functions in the following 20 seconds
  • Figure 4: Compare bitrate efficiencies of QP and resolution
  • Figure 5: FLOPs of different resolution scaling factors
  • ...and 10 more figures