Vision Transformer based Random Walk for Group Re-Identification

Guoqing Zhang; Tianqi Liu; Wenxuan Fang; Yuhui Zheng

Vision Transformer based Random Walk for Group Re-Identification

Guoqing Zhang, Tianqi Liu, Wenxuan Fang, Yuhui Zheng

TL;DR

The paper tackles group re-identification under dynamic group membership and layout changes by introducing a vision-transformer-based random-walk framework. It builds depth-aware graphs from monocular depth maps and refines them with a random-walk process, selecting the graph with the highest average affinity to the gallery. Inter-graph attention and a circle loss enable robust group matching across images, with ablations confirming the benefit of depth-based graph construction, transformer backbones, and modular components. Across RG, DG, and CSG datasets, the approach achieves state-of-the-art performance, demonstrating improved robustness to camera distance and group-layout variations for practical surveillance scenarios.

Abstract

Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of group layout changes. To this end, we propose a novel vision transformer based random walk framework for group re-ID. Specifically, we design a vision transformer based on a monocular depth estimation algorithm to construct a graph through the average depth value of pedestrian features to fully consider the impact of camera distance on group members relationships. In addition, we propose a random walk module to reconstruct the graph by calculating affinity scores between target and gallery images to remove pedestrians who do not belong to the current group. Experimental results show that our framework is superior to most methods.

Vision Transformer based Random Walk for Group Re-Identification

TL;DR

Abstract

Paper Structure (12 sections, 11 equations, 5 figures, 4 tables)

This paper contains 12 sections, 11 equations, 5 figures, 4 tables.

Introduction
RELATED WORK
Method
Overview
Random Walk
Group Matching
EXPERIMENTAL RESULTS
Datasets and Experimental Settings
Compared with Other Group Re-ID Methods
The visualization of results
Ablation Study
Conclusion

Figures (5)

Figure 1: Illustration two methods of constructing the graph: (a) the k-nearest neighbor algorithm and (b) the Random Walk method, which can effectively remove pedestrians (red nodes) that do not belong to the group.
Figure 2: Illustration of our proposed framework for group re-identification. First, we take group pairs as input and use the monocular estimation algorithm to obtain the depth map. Second, we crop the single person images and use the vision transformer by embedding the token of person deep values to obtain the single person features. Subsequently, we construct a context graph using person features as nodes. Then, we calculate the affinity scores between all members in each graph and the gallery images. Finally, the node features of the context graph transfer messages with inter-group in the group matching module.
Figure 3: Example group images in the (a) Road Group, (b) DukeMTMC Group, and (c) CUHK-SYSU-Group datasets.
Figure 4: The visualization of the random walk module process.
Figure 5: The visualize results of the top-5 ranking lists with our method on CSG dataset. Images with green and red borders indicate correct and incorrect matches, respectively.

Vision Transformer based Random Walk for Group Re-Identification

TL;DR

Abstract

Vision Transformer based Random Walk for Group Re-Identification

Authors

TL;DR

Abstract

Table of Contents

Figures (5)