Table of Contents
Fetching ...

Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement

De Cheng, Xiaojian Huang, Nannan Wang, Lingfeng He, Zhihui Li, Xinbo Gao

TL;DR

A Dual Optimal Transport Label Assignment (DOTLA) framework to simultaneously assign the generated labels from one modality to its counterpart modality and a cross-modality neighbor consistency guided label refinement and regularization module to eliminate the negative effects brought by the inaccurate supervised signals.

Abstract

Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims at learning modality-invariant features from unlabeled cross-modality dataset, which is crucial for practical applications in video surveillance systems. The key to essentially address the USL-VI-ReID task is to solve the cross-modality data association problem for further heterogeneous joint learning. To address this issue, we propose a Dual Optimal Transport Label Assignment (DOTLA) framework to simultaneously assign the generated labels from one modality to its counterpart modality. The proposed DOTLA mechanism formulates a mutual reinforcement and efficient solution to cross-modality data association, which could effectively reduce the side-effects of some insufficient and noisy label associations. Besides, we further propose a cross-modality neighbor consistency guided label refinement and regularization module, to eliminate the negative effects brought by the inaccurate supervised signals, under the assumption that the prediction or label distribution of each example should be similar to its nearest neighbors. Extensive experimental results on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method, surpassing existing state-of-the-art approach by a large margin of 7.76% mAP on average, which even surpasses some supervised VI-ReID methods.

Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement

TL;DR

A Dual Optimal Transport Label Assignment (DOTLA) framework to simultaneously assign the generated labels from one modality to its counterpart modality and a cross-modality neighbor consistency guided label refinement and regularization module to eliminate the negative effects brought by the inaccurate supervised signals.

Abstract

Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims at learning modality-invariant features from unlabeled cross-modality dataset, which is crucial for practical applications in video surveillance systems. The key to essentially address the USL-VI-ReID task is to solve the cross-modality data association problem for further heterogeneous joint learning. To address this issue, we propose a Dual Optimal Transport Label Assignment (DOTLA) framework to simultaneously assign the generated labels from one modality to its counterpart modality. The proposed DOTLA mechanism formulates a mutual reinforcement and efficient solution to cross-modality data association, which could effectively reduce the side-effects of some insufficient and noisy label associations. Besides, we further propose a cross-modality neighbor consistency guided label refinement and regularization module, to eliminate the negative effects brought by the inaccurate supervised signals, under the assumption that the prediction or label distribution of each example should be similar to its nearest neighbors. Extensive experimental results on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method, surpassing existing state-of-the-art approach by a large margin of 7.76% mAP on average, which even surpasses some supervised VI-ReID methods.
Paper Structure (19 sections, 13 equations, 4 figures, 5 tables)

This paper contains 19 sections, 13 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overall framework of our proposed method. The framework mainly contains three components: Dual Optimal Transport Label Assignment(DOTLA), Neighbor Consistency guided Label Refinement(NCLR), and Cross-modality Neighbor Consistency Regularization(CNCR). Circle and triangle represent instances from infrared modality and visible modality. Different colors stand for different identities.
  • Figure 2: The hyper-parameter analysis of $\gamma$ and $\alpha$ on SYSU-MM01 dataset
  • Figure 3: The t-SNEvan2008visualizing visualization of learned features for 10 randomly selected identities. Different colors represent different ground-truth identities. "$\mathbf{\circ}$" denotes the samples from visible modality while "$\mathbf{\Diamond}$" from infrared modality.
  • Figure 4: The distribution variations of the inconsistency scores of the infrared samples from RegDB dataset during model training. We conduct comparison on (a) baseline+DOTLA and (b) Baseline+DOTLA+NCLR+CNCR.