Table of Contents
Fetching ...

Tracking Passengers and Baggage Items Using Multiple Overhead Cameras at Security Checkpoints

Abubakar Siddique, Henry Medeiros

TL;DR

A self-supervised learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images to improve object detection accuracy by up to 42% without increasing the inference time of the model.

Abstract

We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a self-supervised learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically transformed images as inputs to a convolutional neural network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multiview trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to 42% without increasing the inference time of the model. Our multicamera association method achieves up to 89% multiobject tracking accuracy with an average computation time of less than 15 ms.

Tracking Passengers and Baggage Items Using Multiple Overhead Cameras at Security Checkpoints

TL;DR

A self-supervised learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images to improve object detection accuracy by up to 42% without increasing the inference time of the model.

Abstract

We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a self-supervised learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically transformed images as inputs to a convolutional neural network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multiview trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to 42% without increasing the inference time of the model. Our multicamera association method achieves up to 89% multiobject tracking accuracy with an average computation time of less than 15 ms.

Paper Structure

This paper contains 28 sections, 6 equations, 16 figures, 10 tables, 5 algorithms.

Figures (16)

  • Figure 1: Proposed SSL framework. The augmented proposal generation stage uses multiple rotated versions of the unlabeled input images to generate augmented detections from an instance segmentation model and then remaps these predictions into their original coordinates. The clustering algorithm leverages the model's regression ability to reduce localization errors using the augmented predictions as region proposals. The regressed cluster modes are then used to generate augmented pseudo-labels to update the model.
  • Figure 2: Visualization of our data augmentation approach. The first and second columns show the segmentation masks and detections at $\theta = 0^{\circ}$ and $\theta = 186^{\circ}$, respectively. The third column shows the remapped detections in the set $S^{\mathcal{C}}$ on the original image (using Alg. \ref{['alg:augmentedproposals']}) with the best detections (blue) from Alg. \ref{['alg:clusterregression']}.
  • Figure 3: Regression on test-time augmented bounding boxes (middle) and cluster modes (right) to generate pseudo-labels for SSL training.
  • Figure 4: Probability of occupancy of passengers at one frame of our evaluation datasets (Fig. \ref{['fig:det_model_describe']}).
  • Figure 5: Document checking station and divestiture area at the Kostas Research Institute simulated airport checkpoint.
  • ...and 11 more figures