Enhancing Surveillance Camera FOV Quality via Semantic Line Detection and Classification with Deep Hough Transform
Andrew C. Freeman, Wenjing Shi, Bin Hwang
TL;DR
The paper tackles automated evaluation of camera field-of-view (FOV) quality in surveillance and autonomous systems, where misaligned FOVs hinder downstream detection. It extends the Deep Hough Transform to perform semantic line detection and classification across five store-related classes, using per-class outputs and a fast Numba-based Hough kernel for efficiency. On EgoCart data, the method achieves a 0.729 F1 score for semantic line detection and 83.8% accuracy for FOV classification, demonstrating that semantic line quality can serve as a proxy for higher-level vision performance. This approach provides an automated, depth-free measure of camera pose quality with practical implications for camera calibration, deployment, and downstream tasks like inventory tracking and surveillance analytics.
Abstract
The quality of recorded videos and images is significantly influenced by the camera's field of view (FOV). In critical applications like surveillance systems and self-driving cars, an inadequate FOV can give rise to severe safety and security concerns, including car accidents and thefts due to the failure to detect individuals and objects. The conventional methods for establishing the correct FOV heavily rely on human judgment and lack automated mechanisms to assess video and image quality based on FOV. In this paper, we introduce an innovative approach that harnesses semantic line detection and classification alongside deep Hough transform to identify semantic lines, thus ensuring a suitable FOV by understanding 3D view through parallel lines. Our approach yields an effective F1 score of 0.729 on the public EgoCart dataset, coupled with a notably high median score in the line placement metric. We illustrate that our method offers a straightforward means of assessing the quality of the camera's field of view, achieving a classification accuracy of 83.8\%. This metric can serve as a proxy for evaluating the potential performance of video and image quality applications.
