WildFusion: Individual Animal Identification with Calibrated Similarity Fusion
Vojtěch Cermak, Lukas Picek, Lukáš Adam, Lukáš Neumann, Jiří Matas
TL;DR
WildFusion tackles zero-shot animal re-identification by fusing calibrated global embeddings and local matching scores. It defines a global similarity $s_G$ from embeddings like MegaDescriptor or DINOv2, a local similarity $s_L$ from matches such as LoFTR or LightGlue with a threshold $\mu$, calibrates these scores to probabilities and ensembles them into a final $s_F$ using weighted averaging. The approach achieves state-of-the-art performance across 17 wildlife datasets, with an average top-1 accuracy of about $84.0\%$ and strong zero-shot results ($76.2\%$ on average with local scores alone), without any fine-tuning or dataset-specific calibration, while offering a practical shortlisting strategy for scalability. The method is validated through extensive ablations and is made available with public code and pre-trained models, underscoring its potential for practical ecology and conservation applications. Limitations include reliance on off-the-shelf local matchers trained on static imagery and a primarily offline deployment focus, suggesting avenues for tailoring local descriptors to wildlife imagery and enabling real-time identification.
Abstract
We propose a new method - WildFusion - for individual identification of a broad range of animal species. The method fuses deep scores (e.g., MegaDescriptor or DINOv2) and local matching similarity (e.g., LoFTR and LightGlue) to identify individual animals. The global and local information fusion is facilitated by similarity score calibration. In a zero-shot setting, relying on local similarity score only, WildFusion achieved mean accuracy, measured on 17 datasets, of 76.2%. This is better than the state-of-the-art model, MegaDescriptor-L, whose training set included 15 of the 17 datasets. If a dataset-specific calibration is applied, mean accuracy increases by 2.3% percentage points. WildFusion, with both local and global similarity scores, outperforms the state-of-the-art significantly - mean accuracy reached 84.0%, an increase of 8.5 percentage points; the mean relative error drops by 35%. We make the code and pre-trained models publicly available5, enabling immediate use in ecology and conservation.
