Table of Contents
Fetching ...

Looking Alike From Far to Near: Enhancing Cross-Resolution Re-Identification via Feature Vector Panning

Zanwu Liu, Chao Yuan, Bo Li, Xiaowei Zhang, Guanglin Niu

TL;DR

This paper tackles cross-resolution person ReID by showing that resolution differences induce a stable semantic direction in feature space. It introduces VPFA, a lightweight postprocessing module with a three-layer MLP VP block and a Vector Panning Loss to align LR features with HR representations without retraining the backbone. The approach relies on identity-level prototypes to learn a resolution-specific offset, yielding significant improvements over state-of-the-art CR-ReID methods while maintaining high efficiency. By validating across multiple datasets and even cross-modal scenarios, VPFA demonstrates strong practical impact for robust re-identification in real-world surveillance systems.

Abstract

In surveillance scenarios, varying camera distances cause significant differences among pedestrian image resolutions, making it hard to match low-resolution (LR) images with high-resolution (HR) counterparts, limiting the performance of Re-Identification (ReID) tasks. Most existing Cross-Resolution ReID (CR-ReID) methods rely on super-resolution (SR) or joint learning for feature compensation, which increases training and inference complexity and has reached a performance bottleneck in recent studies. Inspired by semantic directions in the word embedding space, we empirically discover that semantic directions implying resolution differences also emerge in the feature space of ReID, and we substantiate this finding from a statistical perspective using Canonical Correlation Analysis and Pearson Correlation Analysis. Based on this interesting finding, we propose a lightweight and effective Vector Panning Feature Alignment (VPFA) framework, which conducts CR-ReID from a novel perspective of modeling the resolution-specific feature discrepancy. Extensive experimental results on multiple CR-ReID benchmarks show that our method significantly outperforms previous state-of-the-art baseline models while obtaining higher efficiency, demonstrating the effectiveness and superiority of our model based on the new finding in this paper.

Looking Alike From Far to Near: Enhancing Cross-Resolution Re-Identification via Feature Vector Panning

TL;DR

This paper tackles cross-resolution person ReID by showing that resolution differences induce a stable semantic direction in feature space. It introduces VPFA, a lightweight postprocessing module with a three-layer MLP VP block and a Vector Panning Loss to align LR features with HR representations without retraining the backbone. The approach relies on identity-level prototypes to learn a resolution-specific offset, yielding significant improvements over state-of-the-art CR-ReID methods while maintaining high efficiency. By validating across multiple datasets and even cross-modal scenarios, VPFA demonstrates strong practical impact for robust re-identification in real-world surveillance systems.

Abstract

In surveillance scenarios, varying camera distances cause significant differences among pedestrian image resolutions, making it hard to match low-resolution (LR) images with high-resolution (HR) counterparts, limiting the performance of Re-Identification (ReID) tasks. Most existing Cross-Resolution ReID (CR-ReID) methods rely on super-resolution (SR) or joint learning for feature compensation, which increases training and inference complexity and has reached a performance bottleneck in recent studies. Inspired by semantic directions in the word embedding space, we empirically discover that semantic directions implying resolution differences also emerge in the feature space of ReID, and we substantiate this finding from a statistical perspective using Canonical Correlation Analysis and Pearson Correlation Analysis. Based on this interesting finding, we propose a lightweight and effective Vector Panning Feature Alignment (VPFA) framework, which conducts CR-ReID from a novel perspective of modeling the resolution-specific feature discrepancy. Extensive experimental results on multiple CR-ReID benchmarks show that our method significantly outperforms previous state-of-the-art baseline models while obtaining higher efficiency, demonstrating the effectiveness and superiority of our model based on the new finding in this paper.

Paper Structure

This paper contains 32 sections, 8 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: The motivation of our paper. Top: semantic offset in word embedding space. Mid: resolution direction in feature space. Bottom: our Vector Panning-based cross-resolution Feature Alignment (VPFA).
  • Figure 1: Cosine similarity between the average HR–LR difference vectors computed from two disjoint ID subsets on the Market-1501 and CUHK03 datasets. Higher similarity indicates more consistent resolution-specific offsets across identities.
  • Figure 2: An overview of the proposed Vector Panning based Feature Alignment (VPFA) framework.
  • Figure 3: t-SNE visualization of features from 12 identities. Colors indicate different person IDs, and color brightness reflects resolution level (dark: HR, light: LR). VPFA effectively narrows the gap between HR and LR representations.
  • Figure 4: Visualization of Retrieval Results Before and After Applying Vector Panning based Feature Alignmen(VPFA).(The left side of the figure shows the original experimental results, while the right side shows the results after applying VPFA.)