Table of Contents
Fetching ...

Optimizing ROI Benefits Vehicle ReID in ITS

Mei Qiu, Lauren Ann Christopher, Lingxi Li, Stanley Chien, Yaobin Chen

TL;DR

This work investigates whether regions of interest (ROIs) derived from vehicle-detection confidence can improve vehicle re-identification (ReID) in Intelligent Transportation Systems (ITS). Using YOLOv8 for detection and DeepSORT for tracking, the authors compare feature consistency for images cropped inside versus outside ROIs across eight cameras and two non-overlapping camera pairs, evaluating four backbones (ResNet50, ResNeXt50, ViT, Swin). Feature consistency is assessed with cosine similarity, information entropy, and t-SNE clustering variance. The results show that inside-ROI features are more consistent, with notable gains in night conditions and cross-camera scenarios (e.g., Swin-night: 0.7842 inside vs 0.5 outside; ViT cross-camera: 0.75 inside-inside vs 0.52 inside-outside), suggesting ROI-guided cropping can enhance ReID performance in ITS. Limitations include a relatively small dataset, indicating a need for broader validation and scalability in future work.

Abstract

Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidence scores, can enhance feature matching and ReID tasks. Using our framework with multiple Regions of Interest (ROIs) and lane-wise vehicle counts, we employed YOLOv8 for detection and DeepSORT for tracking across twelve Indiana Highway videos, including two pairs of videos from non-overlapping cameras. Tracked vehicle images were cropped from inside and outside the ROIs at five-frame intervals. Features were extracted using pre-trained models: ResNet50, ResNeXt50, Vision Transformer, and Swin-Transformer. Feature consistency was assessed through cosine similarity, information entropy, and clustering variance. Results showed that features from images cropped inside ROIs had higher mean cosine similarity values compared to those involving one image inside and one outside the ROIs. The most significant difference was observed during night conditions (0.7842 inside vs. 0.5 outside the ROI with Swin-Transformer) and in cross-camera scenarios (0.75 inside-inside vs. 0.52 inside-outside the ROI with Vision Transformer). Information entropy and clustering variance further supported that features in ROIs are more consistent. These findings suggest that strategically selected ROIs can enhance tracking performance and ReID accuracy in ITS.

Optimizing ROI Benefits Vehicle ReID in ITS

TL;DR

This work investigates whether regions of interest (ROIs) derived from vehicle-detection confidence can improve vehicle re-identification (ReID) in Intelligent Transportation Systems (ITS). Using YOLOv8 for detection and DeepSORT for tracking, the authors compare feature consistency for images cropped inside versus outside ROIs across eight cameras and two non-overlapping camera pairs, evaluating four backbones (ResNet50, ResNeXt50, ViT, Swin). Feature consistency is assessed with cosine similarity, information entropy, and t-SNE clustering variance. The results show that inside-ROI features are more consistent, with notable gains in night conditions and cross-camera scenarios (e.g., Swin-night: 0.7842 inside vs 0.5 outside; ViT cross-camera: 0.75 inside-inside vs 0.52 inside-outside), suggesting ROI-guided cropping can enhance ReID performance in ITS. Limitations include a relatively small dataset, indicating a need for broader validation and scalability in future work.

Abstract

Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidence scores, can enhance feature matching and ReID tasks. Using our framework with multiple Regions of Interest (ROIs) and lane-wise vehicle counts, we employed YOLOv8 for detection and DeepSORT for tracking across twelve Indiana Highway videos, including two pairs of videos from non-overlapping cameras. Tracked vehicle images were cropped from inside and outside the ROIs at five-frame intervals. Features were extracted using pre-trained models: ResNet50, ResNeXt50, Vision Transformer, and Swin-Transformer. Feature consistency was assessed through cosine similarity, information entropy, and clustering variance. Results showed that features from images cropped inside ROIs had higher mean cosine similarity values compared to those involving one image inside and one outside the ROIs. The most significant difference was observed during night conditions (0.7842 inside vs. 0.5 outside the ROI with Swin-Transformer) and in cross-camera scenarios (0.75 inside-inside vs. 0.52 inside-outside the ROI with Vision Transformer). Information entropy and clustering variance further supported that features in ROIs are more consistent. These findings suggest that strategically selected ROIs can enhance tracking performance and ReID accuracy in ITS.
Paper Structure (7 sections, 12 equations, 7 figures, 2 tables)

This paper contains 7 sections, 12 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Data Generation Pipeline: Within Camera Matching: Identifies and matches vehicles within the Region of Interest (ROI) and out of ROI in a single camera view. Cross Cameras Matching: Matches vehicles across different camera views, detecting them in both ROI and out of ROI, and matching vehicles in ROIs. Feature Extraction and Analysis Pipeline: Pre-trained ResNet and ResNeXt: Utilize Global Feature extraction after the layer of Average Pooling (GAP). Pre-trained Vision Transformer (ViT) and Swin Transformer: Utilize Global Feature extraction after the layer of Multi-Layer Perceptron (MLP). Cosine similarity, clustering, information entropy and T-SNE data analysis algorithms are used to analyze features' characteristics. Best view in color.
  • Figure 2: Two vehicle examples from different camera views: For each vehicle, in-ROI and out-ROI images within the same camera view exhibit differences in size, aspect ratios, resolutions, and quality.
  • Figure 3: Cosine similarity analysis was conducted using four pre-trained models (ResNet50, ResNeXt50, ViT, Swin-Transformer) across eight camera datasets under four conditions: sunny, rainy, night, and congestion. Features were extracted from the last layer before the fully connected (FC) layer. Cosine similarity was calculated for vehicle image pairs within the region of interest (in-ROI) and for pairs with one image in-ROI and the other out-ROI. T-tests were performed to determine significant differences in features between in-ROI and out-ROI, with a p-value threshold of 0.05. The T-test results from these models show significant differences between in-ROI and out-ROI features, with in-ROI vehicle features having higher similarities than those of out-ROI, as shown in (a), (b), (c), and (d). '*' means the T-test result is significant.
  • Figure 4: (a) Average Information Entropy Across Conditions: Analyzed within a single camera view, the average information entropy for in-ROI and out-ROI features from four models (ResNet50, ResNeXt50, ViT, and Swin-Transformer) across eight cameras and four conditions (sunny, rainy, night, and congestion). ViT exhibits the lowest entropy under all conditions, indicating robustness. Lower entropy signifies fewer variations in feature distribution. No significant difference in average entropy between in-ROI and out-ROI within each model. (b) RMSE of Clustering Variance Across Conditions: Post feature extraction, 2D t-SNE visualizations are generated, and clustering variance is calculated. Lower variance indicates more consistent features. For all four models, in-ROI features are more consistent than out-ROI features under challenging ITS conditions such as night and congestion.
  • Figure 5: T-test of Cosine Similarity Across Cameras: Cosine similarity was analyzed across four non-overlapping highway cameras: (a) the first pair of cameras, and (b) the second pair of cameras. The cosine similarity of the same vehicle's features was compared between cam1-inROI and cam2-inROI, and between cam1-inROI and cam2-outROI, using a p-value threshold of 0.05. Except for the first camera pair (a), features extracted by Swin-Transformer and other models showed significantly higher cosine similarity for inROI-inROI pairs compared to inROI-outROI pairs.'*' means the T-test result is significant.
  • ...and 2 more figures