Table of Contents
Fetching ...

Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference

Changmin Jeon, Seonjun Kim, Juheon Yi, Youngki Lee

TL;DR

Mondrian, an edge system that enables high-performance object detection on high-resolution video streams by devise a novel Compressive Packed Inference to minimize per-pixel processing costs by selectively determining the necessary pixels to process and combining them to maximize processing parallelism.

Abstract

In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressive Packed Inference to minimize per-pixel processing costs by selectively determining the necessary pixels to process and combining them to maximize processing parallelism. In particular, our system quickly extracts ROIs and dynamically shrinks them, reflecting the effect of the fast-changing characteristics of objects and scenes. It then intelligently combines such scaled ROIs into large canvases to maximize the utilization of inference accelerators such as GPU. Evaluation across various datasets, models, and devices shows Mondrian outperforms state-of-the-art baselines (e.g., input rescaling, ROI extractions, ROI extractions+batching) by 15.0-19.7% higher accuracy, leading to $\times$6.65 higher throughput than frame-wise inference for processing various 1080p video streams. We will release the code after the paper review.

Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference

TL;DR

Mondrian, an edge system that enables high-performance object detection on high-resolution video streams by devise a novel Compressive Packed Inference to minimize per-pixel processing costs by selectively determining the necessary pixels to process and combining them to maximize processing parallelism.

Abstract

In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressive Packed Inference to minimize per-pixel processing costs by selectively determining the necessary pixels to process and combining them to maximize processing parallelism. In particular, our system quickly extracts ROIs and dynamically shrinks them, reflecting the effect of the fast-changing characteristics of objects and scenes. It then intelligently combines such scaled ROIs into large canvases to maximize the utilization of inference accelerators such as GPU. Evaluation across various datasets, models, and devices shows Mondrian outperforms state-of-the-art baselines (e.g., input rescaling, ROI extractions, ROI extractions+batching) by 15.0-19.7% higher accuracy, leading to 6.65 higher throughput than frame-wise inference for processing various 1080p video streams. We will release the code after the paper review.
Paper Structure (32 sections, 1 equation, 17 figures, 4 tables)

This paper contains 32 sections, 1 equation, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Concept of Mondrian's Compressive Packed Inference. We extract ROIs from 20 FHD frames, scale and pack them into a single 1280$\times$1280 canvas without accuracy drop.
  • Figure 2: Example scenario of Mondrian: four-way surveillance camera capturing crowded public square.
  • Figure 3: Effect of input size on processing throughput. Pixel throughput means the number of processing pixels in a second.
  • Figure 4: Overview of Compressive Packed Inference.
  • Figure 5: Motivational study on the spatio-temporal variation of Safe area (MTA dataset mta).
  • ...and 12 more figures