Table of Contents
Fetching ...

Enhancing Surveillance Camera FOV Quality via Semantic Line Detection and Classification with Deep Hough Transform

Andrew C. Freeman, Wenjing Shi, Bin Hwang

TL;DR

The paper tackles automated evaluation of camera field-of-view (FOV) quality in surveillance and autonomous systems, where misaligned FOVs hinder downstream detection. It extends the Deep Hough Transform to perform semantic line detection and classification across five store-related classes, using per-class outputs and a fast Numba-based Hough kernel for efficiency. On EgoCart data, the method achieves a 0.729 F1 score for semantic line detection and 83.8% accuracy for FOV classification, demonstrating that semantic line quality can serve as a proxy for higher-level vision performance. This approach provides an automated, depth-free measure of camera pose quality with practical implications for camera calibration, deployment, and downstream tasks like inventory tracking and surveillance analytics.

Abstract

The quality of recorded videos and images is significantly influenced by the camera's field of view (FOV). In critical applications like surveillance systems and self-driving cars, an inadequate FOV can give rise to severe safety and security concerns, including car accidents and thefts due to the failure to detect individuals and objects. The conventional methods for establishing the correct FOV heavily rely on human judgment and lack automated mechanisms to assess video and image quality based on FOV. In this paper, we introduce an innovative approach that harnesses semantic line detection and classification alongside deep Hough transform to identify semantic lines, thus ensuring a suitable FOV by understanding 3D view through parallel lines. Our approach yields an effective F1 score of 0.729 on the public EgoCart dataset, coupled with a notably high median score in the line placement metric. We illustrate that our method offers a straightforward means of assessing the quality of the camera's field of view, achieving a classification accuracy of 83.8\%. This metric can serve as a proxy for evaluating the potential performance of video and image quality applications.

Enhancing Surveillance Camera FOV Quality via Semantic Line Detection and Classification with Deep Hough Transform

TL;DR

The paper tackles automated evaluation of camera field-of-view (FOV) quality in surveillance and autonomous systems, where misaligned FOVs hinder downstream detection. It extends the Deep Hough Transform to perform semantic line detection and classification across five store-related classes, using per-class outputs and a fast Numba-based Hough kernel for efficiency. On EgoCart data, the method achieves a 0.729 F1 score for semantic line detection and 83.8% accuracy for FOV classification, demonstrating that semantic line quality can serve as a proxy for higher-level vision performance. This approach provides an automated, depth-free measure of camera pose quality with practical implications for camera calibration, deployment, and downstream tasks like inventory tracking and surveillance analytics.

Abstract

The quality of recorded videos and images is significantly influenced by the camera's field of view (FOV). In critical applications like surveillance systems and self-driving cars, an inadequate FOV can give rise to severe safety and security concerns, including car accidents and thefts due to the failure to detect individuals and objects. The conventional methods for establishing the correct FOV heavily rely on human judgment and lack automated mechanisms to assess video and image quality based on FOV. In this paper, we introduce an innovative approach that harnesses semantic line detection and classification alongside deep Hough transform to identify semantic lines, thus ensuring a suitable FOV by understanding 3D view through parallel lines. Our approach yields an effective F1 score of 0.729 on the public EgoCart dataset, coupled with a notably high median score in the line placement metric. We illustrate that our method offers a straightforward means of assessing the quality of the camera's field of view, achieving a classification accuracy of 83.8\%. This metric can serve as a proxy for evaluating the potential performance of video and image quality applications.
Paper Structure (19 sections, 2 equations, 3 figures, 1 table)

This paper contains 19 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Our modified Deep Hough Transform architecture. We predict lines for multiple semantic classes and take only the single strongest prediction for each class.
  • Figure 2: Results on our test dataset. For visual clarity, we illustrate the original coordinates of the WallEndCap class in the ground truth and trim it by its intersection with the Aisle lines in the prediction images. Images (a) through (c) show good predictions with all lines. On images (d) through (f), we fail to predict the small WallEndCap line. Images (g) through (i) represent major failure cases where we fail to predict several lines entirely. Image (j) shows a heavily skewed camera angle, with no classes predicted.
  • Figure 3: Example camera positions illustrated on the store layout from spera2019egocart. Red camera icons denote a bad FOV ground truth classification according to the criteria in \ref{['sec:fov_eval']}, while green icons denote a good FOV.