Optimizing ROI Benefits Vehicle ReID in ITS
Mei Qiu, Lauren Ann Christopher, Lingxi Li, Stanley Chien, Yaobin Chen
TL;DR
This work investigates whether regions of interest (ROIs) derived from vehicle-detection confidence can improve vehicle re-identification (ReID) in Intelligent Transportation Systems (ITS). Using YOLOv8 for detection and DeepSORT for tracking, the authors compare feature consistency for images cropped inside versus outside ROIs across eight cameras and two non-overlapping camera pairs, evaluating four backbones (ResNet50, ResNeXt50, ViT, Swin). Feature consistency is assessed with cosine similarity, information entropy, and t-SNE clustering variance. The results show that inside-ROI features are more consistent, with notable gains in night conditions and cross-camera scenarios (e.g., Swin-night: 0.7842 inside vs 0.5 outside; ViT cross-camera: 0.75 inside-inside vs 0.52 inside-outside), suggesting ROI-guided cropping can enhance ReID performance in ITS. Limitations include a relatively small dataset, indicating a need for broader validation and scalability in future work.
Abstract
Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidence scores, can enhance feature matching and ReID tasks. Using our framework with multiple Regions of Interest (ROIs) and lane-wise vehicle counts, we employed YOLOv8 for detection and DeepSORT for tracking across twelve Indiana Highway videos, including two pairs of videos from non-overlapping cameras. Tracked vehicle images were cropped from inside and outside the ROIs at five-frame intervals. Features were extracted using pre-trained models: ResNet50, ResNeXt50, Vision Transformer, and Swin-Transformer. Feature consistency was assessed through cosine similarity, information entropy, and clustering variance. Results showed that features from images cropped inside ROIs had higher mean cosine similarity values compared to those involving one image inside and one outside the ROIs. The most significant difference was observed during night conditions (0.7842 inside vs. 0.5 outside the ROI with Swin-Transformer) and in cross-camera scenarios (0.75 inside-inside vs. 0.52 inside-outside the ROI with Vision Transformer). Information entropy and clustering variance further supported that features in ROIs are more consistent. These findings suggest that strategically selected ROIs can enhance tracking performance and ReID accuracy in ITS.
