Table of Contents
Fetching ...

Optimizing video analytics inference pipelines: a case study

Saeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O'Connell, Hans Vandierendonck

TL;DR

The paper tackles the high computational and cost burden of large-scale, multi-zone poultry welfare video analytics. It presents a real-world case study of FlockFocus and introduces system-level optimizations across detection, tracking, clustering, and behavior analysis, including batched GPU inference, efficient post-processing, and multi-level parallelism. Real-world evaluations show end-to-end speedups of about 2x and substantial cost savings, demonstrating practical strategies for deploying high-throughput, low-latency video analytics in agriculture and similar domains. The work provides a blueprint for optimizing multi-stage video pipelines beyond poultry, addressing scheduling, data flow, and inter-stage communication bottlenecks that frequently dominate performance in large-scale deployments.

Abstract

Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.

Optimizing video analytics inference pipelines: a case study

TL;DR

The paper tackles the high computational and cost burden of large-scale, multi-zone poultry welfare video analytics. It presents a real-world case study of FlockFocus and introduces system-level optimizations across detection, tracking, clustering, and behavior analysis, including batched GPU inference, efficient post-processing, and multi-level parallelism. Real-world evaluations show end-to-end speedups of about 2x and substantial cost savings, demonstrating practical strategies for deploying high-throughput, low-latency video analytics in agriculture and similar domains. The work provides a blueprint for optimizing multi-stage video pipelines beyond poultry, addressing scheduling, data flow, and inter-stage communication bottlenecks that frequently dominate performance in large-scale deployments.

Abstract

Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.

Paper Structure

This paper contains 19 sections, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: FlockFocus system architecture and per-zone video pipelines. The design leverages a unified set of modules, including chicken detection, tracking, and behaviour analysis, which are adapted for each zone’s behavioural targets. Rectangles with sharp corners denote models or algorithms, grey rounded rectangles denote input data and final outputs, and the text on arrows denotes intermediate data streams.
  • Figure 2: Effect of batch size on detection runtime and GPU memory utilization.
  • Figure 3: Breakdown of object detection module with U-Net before and after batching. While inference benefits from batching, the overall effect is obscured by costly preprocessing and post-processing steps.
  • Figure 4: Breakdown of detection latency into preprocessing, inference, and post-processing components, before and after optimization.
  • Figure 5: Detection time, GPU utilization, and GPU memory usage for varying numbers of parallel jobs. Speedup saturates after 4 workers due to increasing GPU resource contention.
  • ...and 8 more figures