Table of Contents
Fetching ...

Enhancing Understanding Through Wildlife Re-Identification

J. Buitenhuis

TL;DR

This work tackles wildlife re-identification by comparing metric-learning approaches to standard classification across two datasets (LionData and FriesianCattle2017). It implements three pipelines—a NumPy-based MLP, TensorFlow/Keras DCNNs trained with triplet loss, and a LightGBM-based wide-model on pairwise embeddings—evaluating them with MAP@R and per-class accuracy. The results show that classification-trained MLP embeddings do not yield meaningful metric representations, while DCNNs offer dataset-dependent gains that do not always align with prior literature; wide models tend to overfit and provide limited improvements. The study underscores the importance of suitable loss functions and model choices for wildlife metric learning and points to future directions such as transformer models and synthetic data to enhance generalization.

Abstract

We explore the field of wildlife re-identification by implementing an MLP from scratch using NumPy, A DCNN using Keras, and a binary classifier with LightGBM for the purpose of learning for an assignment. Analyzing the performance of multiple models on multiple datasets. We attempt to replicate prior research in metric learning for wildlife re-identification. Firstly, we find that the usage of MLPs trained for classification, then removing the output layer and using the second last layer as an embedding was not a successful strategy for similar learning; it seems like losses designed for embeddings such as triplet loss are required. The DCNNS performed well on some datasets but poorly on others, which did not align with findings in previous literature. The LightGBM classifier overfitted too heavily and was not significantly better than a constant model when trained and evaluated on all pairs using accuracy as a metric. The technical implementations used seem to match standards according to comparisons with documentation examples and good results on certain datasets. However, there is still more to explore in regards to being able to fully recreate past literature.

Enhancing Understanding Through Wildlife Re-Identification

TL;DR

This work tackles wildlife re-identification by comparing metric-learning approaches to standard classification across two datasets (LionData and FriesianCattle2017). It implements three pipelines—a NumPy-based MLP, TensorFlow/Keras DCNNs trained with triplet loss, and a LightGBM-based wide-model on pairwise embeddings—evaluating them with MAP@R and per-class accuracy. The results show that classification-trained MLP embeddings do not yield meaningful metric representations, while DCNNs offer dataset-dependent gains that do not always align with prior literature; wide models tend to overfit and provide limited improvements. The study underscores the importance of suitable loss functions and model choices for wildlife metric learning and points to future directions such as transformer models and synthetic data to enhance generalization.

Abstract

We explore the field of wildlife re-identification by implementing an MLP from scratch using NumPy, A DCNN using Keras, and a binary classifier with LightGBM for the purpose of learning for an assignment. Analyzing the performance of multiple models on multiple datasets. We attempt to replicate prior research in metric learning for wildlife re-identification. Firstly, we find that the usage of MLPs trained for classification, then removing the output layer and using the second last layer as an embedding was not a successful strategy for similar learning; it seems like losses designed for embeddings such as triplet loss are required. The DCNNS performed well on some datasets but poorly on others, which did not align with findings in previous literature. The LightGBM classifier overfitted too heavily and was not significantly better than a constant model when trained and evaluated on all pairs using accuracy as a metric. The technical implementations used seem to match standards according to comparisons with documentation examples and good results on certain datasets. However, there is still more to explore in regards to being able to fully recreate past literature.
Paper Structure (16 sections, 1 figure, 3 tables)

This paper contains 16 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Example representing the idea behind introducing synthetic training data