Query Time Optimized Deep Learning Based Video Inference System

Mingren Shen; Shuoxuan Dong; Xiuyuan He

Query Time Optimized Deep Learning Based Video Inference System

Mingren Shen, Shuoxuan Dong, Xiuyuan He

TL;DR

This work addresses the latency and cost challenges of video inference by extending the Focus framework with feature-map reuse across a cheap ingest CNN and a more expensive query CNN. By sharing intermediate representations from common network blocks (e.g., the first 4 or 7 blocks of ResNet-50/152) and retraining the remaining layers, the approach achieves substantial query-time savings while controlling accuracy loss. Empirical results show 9.6% to 17.5% latency reductions depending on the reuse depth and retraining, with accuracy preserved in deeper cuts. The study demonstrates a practical, scalable path to low-cost, low-latency video queries and provides a foundation for broader exploration of cross-model layer sharing in CNN-based video inference.

Abstract

This is a project report about how we tune Focus[1], a video inference system that provides low cost and low latency, through two phases. In this report, we will decrease the query time by saving the middle layer output of the neural network. This is a trade-off strategy that involves using more space to save time. We show how this scheme works using prototype systems, and it saves 20% of the time. The code repository URL is here, https://github.com/iphyer/CS744 FocousIngestOpt.

Query Time Optimized Deep Learning Based Video Inference System

TL;DR

Abstract

Paper Structure (19 sections, 7 figures, 1 table)

This paper contains 19 sections, 7 figures, 1 table.

Introduction
Background
Video Inference System
Convolutional Neural Networks
Traffic Video Characteristic
Feature Map
Design
Dataset Generation
Building Baseline
Feature Map Reuse
Overall Experiment Design
Evaluation
Related Work
Object Detection
Other Video Inference Systems
...and 4 more sections

Figures (7)

Figure 1: Project System Design
Figure 2: Background Subtraction and Generated Patches
Figure 3: ResNet architecture and two different cuts.The red color reuses 4 blocks and the blue one reuses 7 blocks.
Figure 4: Training Curves of ResNet-50
Figure 5: Training Curves of ResNet-152
...and 2 more figures

Query Time Optimized Deep Learning Based Video Inference System

TL;DR

Abstract

Query Time Optimized Deep Learning Based Video Inference System

Authors

TL;DR

Abstract

Table of Contents

Figures (7)