Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

Jiangbo Pei; Zhuqing Jiang; Aidong Men; Haiying Wang; Haiyong Luo; Shiping Wen

Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

Jiangbo Pei, Zhuqing Jiang, Aidong Men, Haiying Wang, Haiyong Luo, Shiping Wen

TL;DR

This article proposes a novel solution: the camera-invariant meta-learning network (CIMN) for SCT re-ID, which achieves comparable performance with and without the use of CCSP data, and outperforms state-of-the-art methods on three SCT re-ID benchmarks.

Abstract

Single-camera-training person re-identification (SCT re-ID) aims to train a re-ID model using SCT datasets where each person appears in only one camera. The main challenge of SCT re-ID is to learn camera-invariant feature representations without cross-camera same-person (CCSP) data as supervision. Previous methods address it by assuming that the most similar person should be found in another camera. However, this assumption is not guaranteed to be correct. In this paper, we propose a Camera-Invariant Meta-Learning Network (CIMN) for SCT re-ID. CIMN assumes that the camera-invariant feature representations should be robust to camera changes. To this end, we split the training data into meta-train set and meta-test set based on camera IDs and perform a cross-camera simulation via meta-learning strategy, aiming to enforce the representations learned from the meta-train set to be robust to the meta-test set. With the cross-camera simulation, CIMN can learn camera-invariant and identity-discriminative representations even there are no CCSP data. However, this simulation also causes the separation of the meta-train set and the meta-test set, which ignores some beneficial relations between them. Thus, we introduce three losses: meta triplet loss, meta classification loss, and meta camera alignment loss, to leverage the ignored relations. The experiment results demonstrate that our method achieves comparable performance with and without CCSP data, and outperforms the state-of-the-art methods on SCT re-ID benchmarks. In addition, it is also effective in improving the domain generalization ability of the model.

Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

TL;DR

Abstract

Paper Structure (40 sections, 15 equations, 6 figures, 10 tables, 1 algorithm)

This paper contains 40 sections, 15 equations, 6 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Conventional Person Re-Identification
Single-Camera-Training Person Re-Identification
Meta-Learning
Domain Generalizable Person Re-Identification
APPROACH
Overview
Meta-Batch Preparation.
Cross-Camera Simulation
Meta-Train
Meta-Test
Simulation Loss
Meta-Optimization
Meta Triplet Loss
...and 25 more sections

Figures (6)

Figure 1: The comparison between SCT re-ID setting and previous re-ID settings. Different colors represent different identities. Conventional supervised re-ID data are composed of a large number of people appearing in multiple cameras, and there are many data that belong to the same identity but are captured by different cameras (CCSP data), such as the ID-1 in this figure. In this setting, we know the identity of each data. Unsupervised re-ID has no identity annotation for data, but it still requires latent CCSP data in the training set (such as the blue data). In weakly-supervised re-ID, data are usually weakly annotated. We showed an example of wang2019weakly, which annotates data at bag-level, and the identity of each data is unknown. Similar to unsupervised re-ID, this setting also needs CCSP data for training. In contrast, under the single-camera-training (SCT) setting, each person appears in only one camera; thus, there are no CCSP data. Compared to previous settings, this setting is easy to collect and annotate data.
Figure 2: An overview of our proposed CIMN. Three cameras are presented in this figure for a demonstration. Before training, we first construct camera pairs. Then we build meta-train set and meta-test set from each pair to construct meta-batch. During training, CIMN cuts off the relation between two sets to perform cross-camera simulation with the meta-train process and the meta-test process, calculating the simulation loss $L_{smi}$. After that, the meta triplet loss $L_{mtri}$, the meta classification loss $L_{cl}$, and the meta camera alignment loss $L_{mca}$ are imposed to leverage the negative and positive relations between the two sets. Finally, all losses are combined to optimize the model.
Figure 3: The constructions of the training set of the standard setting (Market-STD and Duke-STD), SCT setting (Market-SCT and Duke-SCT) and the control group setting (Market-CG and Duke-CG).
Figure 4: The structure of our baseline model.
Figure 5: The performance tendency curves with changing CCSP data on Market-1501 dataset (Fig. a) and DukeMTMC-reID dataset (Fig. b).
...and 1 more figures

Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

TL;DR

Abstract

Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)