WildFusion: Individual Animal Identification with Calibrated Similarity Fusion

Vojtěch Cermak; Lukas Picek; Lukáš Adam; Lukáš Neumann; Jiří Matas

WildFusion: Individual Animal Identification with Calibrated Similarity Fusion

Vojtěch Cermak, Lukas Picek, Lukáš Adam, Lukáš Neumann, Jiří Matas

TL;DR

WildFusion tackles zero-shot animal re-identification by fusing calibrated global embeddings and local matching scores. It defines a global similarity $s_G$ from embeddings like MegaDescriptor or DINOv2, a local similarity $s_L$ from matches such as LoFTR or LightGlue with a threshold $\mu$, calibrates these scores to probabilities and ensembles them into a final $s_F$ using weighted averaging. The approach achieves state-of-the-art performance across 17 wildlife datasets, with an average top-1 accuracy of about $84.0\%$ and strong zero-shot results ($76.2\%$ on average with local scores alone), without any fine-tuning or dataset-specific calibration, while offering a practical shortlisting strategy for scalability. The method is validated through extensive ablations and is made available with public code and pre-trained models, underscoring its potential for practical ecology and conservation applications. Limitations include reliance on off-the-shelf local matchers trained on static imagery and a primarily offline deployment focus, suggesting avenues for tailoring local descriptors to wildlife imagery and enabling real-time identification.

Abstract

We propose a new method - WildFusion - for individual identification of a broad range of animal species. The method fuses deep scores (e.g., MegaDescriptor or DINOv2) and local matching similarity (e.g., LoFTR and LightGlue) to identify individual animals. The global and local information fusion is facilitated by similarity score calibration. In a zero-shot setting, relying on local similarity score only, WildFusion achieved mean accuracy, measured on 17 datasets, of 76.2%. This is better than the state-of-the-art model, MegaDescriptor-L, whose training set included 15 of the 17 datasets. If a dataset-specific calibration is applied, mean accuracy increases by 2.3% percentage points. WildFusion, with both local and global similarity scores, outperforms the state-of-the-art significantly - mean accuracy reached 84.0%, an increase of 8.5 percentage points; the mean relative error drops by 35%. We make the code and pre-trained models publicly available5, enabling immediate use in ecology and conservation.

WildFusion: Individual Animal Identification with Calibrated Similarity Fusion

TL;DR

WildFusion tackles zero-shot animal re-identification by fusing calibrated global embeddings and local matching scores. It defines a global similarity

from embeddings like MegaDescriptor or DINOv2, a local similarity

from matches such as LoFTR or LightGlue with a threshold

, calibrates these scores to probabilities and ensembles them into a final

using weighted averaging. The approach achieves state-of-the-art performance across 17 wildlife datasets, with an average top-1 accuracy of about

and strong zero-shot results (

on average with local scores alone), without any fine-tuning or dataset-specific calibration, while offering a practical shortlisting strategy for scalability. The method is validated through extensive ablations and is made available with public code and pre-trained models, underscoring its potential for practical ecology and conservation applications. Limitations include reliance on off-the-shelf local matchers trained on static imagery and a primarily offline deployment focus, suggesting avenues for tailoring local descriptors to wildlife imagery and enabling real-time identification.

Abstract

Paper Structure (17 sections, 4 equations, 6 figures, 5 tables)

This paper contains 17 sections, 4 equations, 6 figures, 5 tables.

Introduction
Related work
Methodology
Global similarity score
Matching based similarity score
Score calibration
WildFusion -- Calibrated score ensembling
Datasets
Experiments
Baseline Performance
Ablation Studies
Effect of local matching score threshold
Effect of score selection
Effect of calibration
Constraining number of comparisons
...and 2 more sections

Figures (6)

Figure 1: Calibrated similarity fusion. Fusing local (in the $[0, \mathcal{R}]$ range) and global matching scores (e.g., cosine similarity) is not possible without calibration. By calibrating the outputs of any local and global matcher, we can easily fuse them and achieve better performance. In terms of accuracy and evaluated on 17 datasets, we increased the performance by 8.5% on average and reduced relative error by 35%.
Figure 2: Distinct animal features for re-identification. Based on the natural visual appearance, the most distinguishable features for animals are spots, stripes, facial landmarks, and the shape of body parts (e.g., ears for elephants and fin for whales).
Figure 3: Qualitative performance. Selected examples where WildFusion changed the decision of the MegaDescriptor-L on NyalaData, WhaleSharkID, and ZindiTurtle; three correct and false samples. We suspect that some wrong matches are mislabeled data.
Figure 4: Effect of $\mu$ on performance. Full lines represents constant $\mu$, and dotted lines optimal $\mu$ found on validations set for each dataset. Fixing $\mu=0.5$ provides comparable results to the best $\mu$ based on validation set.
Figure 5: Ablation on optimal number of images for model calibration. Isotonic regression calibration with fixed $\mu=0.5$ for all datasets outperforms other approaches in low data scenarios.
...and 1 more figures

WildFusion: Individual Animal Identification with Calibrated Similarity Fusion

TL;DR

Abstract

WildFusion: Individual Animal Identification with Calibrated Similarity Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (6)