Table of Contents
Fetching ...

Dissecting Human Body Representations in Deep Networks Trained for Person Identification

Thomas M Metz, Matthew Q Hill, Blake Myers, Veda Nandan Gandi, Rahul Chilakapati, Alice J O'Toole

TL;DR

This work probes how deep body-identification networks encode information beyond identity, including residual face cues, gender, viewpoint, and image-origin signals, by analyzing four backbones trained on nearly 2 million images across 9 datasets. Through face-obscured and face-only tests, linear readouts for gender and viewpoint, and PCA-based subspace editing, the authors show that facial information aids body-ID, yet nonidentity attributes persist in embeddings and can be exploited for improved retrieval without retraining. The study provides cross-architecture evidence that simple subspace techniques, including selective deletion of early principal components, can boost Rank-1, TAR@FAR $10^{-3}$, and mAP across datasets, while highlighting potential privacy and security implications of such leakage. These insights illuminate both opportunities for semantic editing and risks in biometric systems, and offer practical methods for improving long-term body re-identification performance with no additional training.

Abstract

Long-term body identification algorithms have emerged recently with the increased availability of high-quality training data. We seek to fill knowledge gaps about these models by analyzing body image embeddings from four body identification networks trained with 1.9 million images across 4,788 identities and 9 databases. By analyzing a diverse range of architectures (ViT, SWIN-ViT, CNN, and linguistically primed CNN), we first show that the face contributes to the accuracy of body identification algorithms and that these algorithms can identify faces to some extent -- with no explicit face training. Second, we show that representations (embeddings) generated by body identification algorithms encode information about gender, as well as image-based information including view (yaw) and even the dataset from which the image originated. Third, we demonstrate that identification accuracy can be improved without additional training by operating directly and selectively on the learned embedding space. Leveraging principal component analysis (PCA), identity comparisons were consistently more accurate in subspaces that eliminated dimensions that explained large amounts of variance. These three findings were surprisingly consistent across architectures and test datasets. This work represents the first analysis of body representations produced by long-term re-identification networks trained on challenging unconstrained datasets.

Dissecting Human Body Representations in Deep Networks Trained for Person Identification

TL;DR

This work probes how deep body-identification networks encode information beyond identity, including residual face cues, gender, viewpoint, and image-origin signals, by analyzing four backbones trained on nearly 2 million images across 9 datasets. Through face-obscured and face-only tests, linear readouts for gender and viewpoint, and PCA-based subspace editing, the authors show that facial information aids body-ID, yet nonidentity attributes persist in embeddings and can be exploited for improved retrieval without retraining. The study provides cross-architecture evidence that simple subspace techniques, including selective deletion of early principal components, can boost Rank-1, TAR@FAR , and mAP across datasets, while highlighting potential privacy and security implications of such leakage. These insights illuminate both opportunities for semantic editing and risks in biometric systems, and offer practical methods for improving long-term body re-identification performance with no additional training.

Abstract

Long-term body identification algorithms have emerged recently with the increased availability of high-quality training data. We seek to fill knowledge gaps about these models by analyzing body image embeddings from four body identification networks trained with 1.9 million images across 4,788 identities and 9 databases. By analyzing a diverse range of architectures (ViT, SWIN-ViT, CNN, and linguistically primed CNN), we first show that the face contributes to the accuracy of body identification algorithms and that these algorithms can identify faces to some extent -- with no explicit face training. Second, we show that representations (embeddings) generated by body identification algorithms encode information about gender, as well as image-based information including view (yaw) and even the dataset from which the image originated. Third, we demonstrate that identification accuracy can be improved without additional training by operating directly and selectively on the learned embedding space. Leveraging principal component analysis (PCA), identity comparisons were consistently more accurate in subspaces that eliminated dimensions that explained large amounts of variance. These three findings were surprisingly consistent across architectures and test datasets. This work represents the first analysis of body representations produced by long-term re-identification networks trained on challenging unconstrained datasets.

Paper Structure

This paper contains 27 sections, 12 figures, 2 tables, 2 algorithms.

Figures (12)

  • Figure 1: Sample images from the DeepChange dataset. Subject faces are blurred when visible according to DeepChange publication guidelines.
  • Figure 2: The performance of the four body identification networks decreases when facial information is obscured.
  • Figure 3: Sample cropped face images from the body dataset OToole_2005. All subjects consented to publication.
  • Figure 4: Body networks learn residual information about face in their training. These networks can perform unconstrained face identification tasks to some degree.
  • Figure 5: Body Networks learn residual information about faces in their training. This information generalizes to high quality faces.
  • ...and 7 more figures