Table of Contents
Fetching ...

ReMix: Training Generalized Person Re-identification on a Mixture of Data

Timur Mamedov, Anton Konushin, Vadim Konushin

TL;DR

Experiments show that ReMix has a high generalization ability and outperforms state-of-the-art methods in generalizable person Re-ID, the first work that explores joint training on a mixture of multi-camera and single-camera data in person Re-ID.

Abstract

Modern person re-identification (Re-ID) methods have a weak generalization ability and experience a major accuracy drop when capturing environments change. This is because existing multi-camera Re-ID datasets are limited in size and diversity, since such data is difficult to obtain. At the same time, enormous volumes of unlabeled single-camera records are available. Such data can be easily collected, and therefore, it is more diverse. Currently, single-camera data is used only for self-supervised pre-training of Re-ID methods. However, the diversity of single-camera data is suppressed by fine-tuning on limited multi-camera data after pre-training. In this paper, we propose ReMix, a generalized Re-ID method jointly trained on a mixture of limited labeled multi-camera and large unlabeled single-camera data. Effective training of our method is achieved through a novel data sampling strategy and new loss functions that are adapted for joint use with both types of data. Experiments show that ReMix has a high generalization ability and outperforms state-of-the-art methods in generalizable person Re-ID. To the best of our knowledge, this is the first work that explores joint training on a mixture of multi-camera and single-camera data in person Re-ID.

ReMix: Training Generalized Person Re-identification on a Mixture of Data

TL;DR

Experiments show that ReMix has a high generalization ability and outperforms state-of-the-art methods in generalizable person Re-ID, the first work that explores joint training on a mixture of multi-camera and single-camera data in person Re-ID.

Abstract

Modern person re-identification (Re-ID) methods have a weak generalization ability and experience a major accuracy drop when capturing environments change. This is because existing multi-camera Re-ID datasets are limited in size and diversity, since such data is difficult to obtain. At the same time, enormous volumes of unlabeled single-camera records are available. Such data can be easily collected, and therefore, it is more diverse. Currently, single-camera data is used only for self-supervised pre-training of Re-ID methods. However, the diversity of single-camera data is suppressed by fine-tuning on limited multi-camera data after pre-training. In this paper, we propose ReMix, a generalized Re-ID method jointly trained on a mixture of limited labeled multi-camera and large unlabeled single-camera data. Effective training of our method is achieved through a novel data sampling strategy and new loss functions that are adapted for joint use with both types of data. Experiments show that ReMix has a high generalization ability and outperforms state-of-the-art methods in generalizable person Re-ID. To the best of our knowledge, this is the first work that explores joint training on a mixture of multi-camera and single-camera data in person Re-ID.

Paper Structure

This paper contains 29 sections, 9 equations, 5 figures, 12 tables, 2 algorithms.

Figures (5)

  • Figure 1: Examples of multi-camera and single-camera data. As we can see, multi-camera data is much more complex in terms of Re-ID: background, lighting, capturing angle, etc., may differ significantly for one person in multi-camera data. In contrast, images of the same person are less complex in single-camera data.
  • Figure 2: Scheme of ReMix. At the beginning of each epoch, all images from the person Re-ID dataset (multi-camera data) pass through the momentum encoder to obtain centroids for each identity (bottom part of the scheme). Simultaneously, videos are randomly sampled from the unlabeled single-camera dataset, and images from the selected videos are clustered using embeddings from the momentum encoder and pseudo labeled (top part of the scheme). After that, labeled multi-camera and pseudo labeled single-camera data are fed to the encoder as input. To train the encoder, the following new loss functions are used: the Instance Loss, the Augmentation Loss, and the Centroids Loss are calculated for both types of data, whereas the Camera Centroids Loss is calculated only for multi-camera data.
  • Figure 3: Comparison of TOP-5 retrieved images on the Market-1501 dataset between ReMix and QAConv-GS liao2022graph. Green boxes denote correct results, while red boxes denote incorrect results.
  • Figure 4: Examples of single-camera data clusters obtained during ReMix training. Four random images from each arbitrary cluster are selected for visualization.
  • Figure 5: Visualization of activation maps of ReMix on the Market-1501 dataset.