Table of Contents
Fetching ...

Query Time Optimized Deep Learning Based Video Inference System

Mingren Shen, Shuoxuan Dong, Xiuyuan He

TL;DR

This work addresses the latency and cost challenges of video inference by extending the Focus framework with feature-map reuse across a cheap ingest CNN and a more expensive query CNN. By sharing intermediate representations from common network blocks (e.g., the first 4 or 7 blocks of ResNet-50/152) and retraining the remaining layers, the approach achieves substantial query-time savings while controlling accuracy loss. Empirical results show 9.6% to 17.5% latency reductions depending on the reuse depth and retraining, with accuracy preserved in deeper cuts. The study demonstrates a practical, scalable path to low-cost, low-latency video queries and provides a foundation for broader exploration of cross-model layer sharing in CNN-based video inference.

Abstract

This is a project report about how we tune Focus[1], a video inference system that provides low cost and low latency, through two phases. In this report, we will decrease the query time by saving the middle layer output of the neural network. This is a trade-off strategy that involves using more space to save time. We show how this scheme works using prototype systems, and it saves 20% of the time. The code repository URL is here, https://github.com/iphyer/CS744 FocousIngestOpt.

Query Time Optimized Deep Learning Based Video Inference System

TL;DR

This work addresses the latency and cost challenges of video inference by extending the Focus framework with feature-map reuse across a cheap ingest CNN and a more expensive query CNN. By sharing intermediate representations from common network blocks (e.g., the first 4 or 7 blocks of ResNet-50/152) and retraining the remaining layers, the approach achieves substantial query-time savings while controlling accuracy loss. Empirical results show 9.6% to 17.5% latency reductions depending on the reuse depth and retraining, with accuracy preserved in deeper cuts. The study demonstrates a practical, scalable path to low-cost, low-latency video queries and provides a foundation for broader exploration of cross-model layer sharing in CNN-based video inference.

Abstract

This is a project report about how we tune Focus[1], a video inference system that provides low cost and low latency, through two phases. In this report, we will decrease the query time by saving the middle layer output of the neural network. This is a trade-off strategy that involves using more space to save time. We show how this scheme works using prototype systems, and it saves 20% of the time. The code repository URL is here, https://github.com/iphyer/CS744 FocousIngestOpt.
Paper Structure (19 sections, 7 figures, 1 table)

This paper contains 19 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Project System Design
  • Figure 2: Background Subtraction and Generated Patches
  • Figure 3: ResNet architecture and two different cuts.The red color reuses 4 blocks and the blue one reuses 7 blocks.
  • Figure 4: Training Curves of ResNet-50
  • Figure 5: Training Curves of ResNet-152
  • ...and 2 more figures