Table of Contents
Fetching ...

Addressing the Elephant in the Room: Robust Animal Re-Identification with Unsupervised Part-Based Feature Alignment

Yingxue Yu, Vidit Vidit, Andrey Davydov, Martin Engilberge, Pascal Fua

TL;DR

The paper tackles robust animal re-identification by mitigating background bias and leveraging unsupervised part-based representations. It introduces a dual strategy: (1) systematic background removal during training and evaluation to focus on the animal, and (2) unsupervised part-aware learning via Descriptor Vector Exchange (DVE) integrated into a SE-ResNet50 backbone, with a loss combination that includes $L_{ID}$, $L_{LR}$, Circle loss $L_{reID}$, and $L_{DVE}$. Empirical results on ATRW, YakReID-103, and ELPephants demonstrate state-of-the-art performance, improved intra- and inter-species part alignment, and promising cross-species transfer, though limitations remain in masks and inter-species generalization. The work highlights the practical impact of background masking and unsupervised part alignment for wildlife Re-ID, and provides ablations, qualitative analyses, and transfer evaluations to support its claims.

Abstract

Animal Re-ID is crucial for wildlife conservation, yet it faces unique challenges compared to person Re-ID. First, the scarcity and lack of diversity in datasets lead to background-biased models. Second, animal Re-ID depends on subtle, species-specific cues, further complicated by variations in pose, background, and lighting. This study addresses background biases by proposing a method to systematically remove backgrounds in both training and evaluation phases. And unlike prior works that depend on pose annotations, our approach utilizes an unsupervised technique for feature alignment across body parts and pose variations, enhancing practicality. Our method achieves superior results on three key animal Re-ID datasets: ATRW, YakReID-103, and ELPephants.

Addressing the Elephant in the Room: Robust Animal Re-Identification with Unsupervised Part-Based Feature Alignment

TL;DR

The paper tackles robust animal re-identification by mitigating background bias and leveraging unsupervised part-based representations. It introduces a dual strategy: (1) systematic background removal during training and evaluation to focus on the animal, and (2) unsupervised part-aware learning via Descriptor Vector Exchange (DVE) integrated into a SE-ResNet50 backbone, with a loss combination that includes , , Circle loss , and . Empirical results on ATRW, YakReID-103, and ELPephants demonstrate state-of-the-art performance, improved intra- and inter-species part alignment, and promising cross-species transfer, though limitations remain in masks and inter-species generalization. The work highlights the practical impact of background masking and unsupervised part alignment for wildlife Re-ID, and provides ablations, qualitative analyses, and transfer evaluations to support its claims.

Abstract

Animal Re-ID is crucial for wildlife conservation, yet it faces unique challenges compared to person Re-ID. First, the scarcity and lack of diversity in datasets lead to background-biased models. Second, animal Re-ID depends on subtle, species-specific cues, further complicated by variations in pose, background, and lighting. This study addresses background biases by proposing a method to systematically remove backgrounds in both training and evaluation phases. And unlike prior works that depend on pose annotations, our approach utilizes an unsupervised technique for feature alignment across body parts and pose variations, enhancing practicality. Our method achieves superior results on three key animal Re-ID datasets: ATRW, YakReID-103, and ELPephants.
Paper Structure (32 sections, 10 equations, 8 figures, 5 tables)

This paper contains 32 sections, 10 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Proposed Animal Re-ID Approach: Addressing background bias in Re-ID models, our method masks out backgrounds to focus on the animal. It learns part-aware representations, ensuring consistency across subjects. Part-aware features are merged and a final Re-ID score is computed via cosine similarity.
  • Figure 2: Bias Towards Background: On the left, we display samples from the YakReID-103 dataset. Utilizing features from the PGCFL liu_pose-guided_2019 model, we identify the nearest neighbors. Each image's entity label is presented in the bottom-left corner. The retrieved images showcase four distinct entities, all sharing a remarkably similar background. This indicates that distances in the feature space are significantly influenced by background similarities. On the right, we exhibit the outcomes of our proposed background segmentation protocol. The top-left image is the original, the bottom-left depicts results from SAM, the bottom-right from ISNet, and the top-right combines outputs from both SAM and ISNet.
  • Figure 3: Overall architecture: Left - proposed model's architecture. Input is a single image, processed through the first $3$ layers of backbone and a convolutional block to extract DVE features $\Phi(\mathbf{x})$. Features produced after the fifth backbone layer continue to global average pooling and a linear layer for Re-ID features $f(\mathbf{x})$, then pass through two classification heads for ID class and orientation prediction via linear layers and softmax operations. Right - $L_{DVE}$ for part-aware representation.
  • Figure 4: Visualization of the feature learned with $L_{DVE}$ The first two rows show intra-species part alignment, the next two rows demonstrate that a model trained solely on tiger can generalize to other species and maintain alignment even in inter-species scenario. The final row is the results from the PGCFL baseline. In each row, the green dot in the left image is the local query, while the red dot in the center image indicates its matching point. The right-most image provides a heatmap overlaid on the target image, showcasing the similarities between the local query and the center image.
  • Figure A.5: Sample images from YakReID-103 and ATRW and ELPephants. In each row, the first two images are of the same entity while the last image is of another.
  • ...and 3 more figures