Table of Contents
Fetching ...

Diverse Deep Feature Ensemble Learning for Omni-Domain Generalized Person Re-identification

Eugene P. W. Ang, Shan Lin, Alex C. Kot

TL;DR

This work proposes a way to achieve Omni-Domain Generalization Person ReID by creating deep feature diversity with self-ensembles and significantly improves and matches the SOTA performance for major domain generalization and single-domain supervised benchmarks.

Abstract

Person Re-identification (Person ReID) has progressed to a level where single-domain supervised Person ReID performance has saturated. However, such methods experience a significant drop in performance when trained and tested across different datasets, motivating the development of domain generalization techniques. However, our research reveals that domain generalization methods significantly underperform single-domain supervised methods on single dataset benchmarks. An ideal Person ReID method should be effective regardless of the number of domains involved, and when test domain data is available for training it should perform as well as state-of-the-art (SOTA) fully supervised methods. This is a paradigm that we call Omni-Domain Generalization Person ReID (ODG-ReID). We propose a way to achieve ODG-ReID by creating deep feature diversity with self-ensembles. Our method, Diverse Deep Feature Ensemble Learning (D2FEL), deploys unique instance normalization patterns that generate multiple diverse views and recombines these views into a compact encoding. To the best of our knowledge, our work is one of few to consider omni-domain generalization in Person ReID, and we advance the study of applying feature ensembles in Person ReID. D2FEL significantly improves and matches the SOTA performance for major domain generalization and single-domain supervised benchmarks.

Diverse Deep Feature Ensemble Learning for Omni-Domain Generalized Person Re-identification

TL;DR

This work proposes a way to achieve Omni-Domain Generalization Person ReID by creating deep feature diversity with self-ensembles and significantly improves and matches the SOTA performance for major domain generalization and single-domain supervised benchmarks.

Abstract

Person Re-identification (Person ReID) has progressed to a level where single-domain supervised Person ReID performance has saturated. However, such methods experience a significant drop in performance when trained and tested across different datasets, motivating the development of domain generalization techniques. However, our research reveals that domain generalization methods significantly underperform single-domain supervised methods on single dataset benchmarks. An ideal Person ReID method should be effective regardless of the number of domains involved, and when test domain data is available for training it should perform as well as state-of-the-art (SOTA) fully supervised methods. This is a paradigm that we call Omni-Domain Generalization Person ReID (ODG-ReID). We propose a way to achieve ODG-ReID by creating deep feature diversity with self-ensembles. Our method, Diverse Deep Feature Ensemble Learning (D2FEL), deploys unique instance normalization patterns that generate multiple diverse views and recombines these views into a compact encoding. To the best of our knowledge, our work is one of few to consider omni-domain generalization in Person ReID, and we advance the study of applying feature ensembles in Person ReID. D2FEL significantly improves and matches the SOTA performance for major domain generalization and single-domain supervised benchmarks.

Paper Structure

This paper contains 20 sections, 5 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Left: Structure of a bottleneck layer in a ResNet ResNet, where IN can be selectively applied at the end. Right: all possible IN-combinations applied to the final two bottlenecks of the network.
  • Figure 2: A summary of our D$^2$FEL framework, which uses partial copies of itself to form an ensemble. Each copy is specialized with a unique pattern of instance normalization.
  • Figure 3: We compare the effects of averaging the coordinate values of class logits vs. regular output features. Even though logits do not perform as well as features for ReID, a pattern is clear: averaging class logits boost performance because they are semantically aligned by design, but averaging features reduces performance because their coordinates do not necessarily align.
  • Figure 4: Design of our deep auto-encoder experiments.
  • Figure 5: Comparing PCA vs. Random Projection performance on the C+D+MS $\rightarrow$ M benchmark as target dimensions are reduced. The original dimension is 16384. Down to a target dimension of around 2048, both methods are comparable.