Table of Contents
Fetching ...

Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris

Wenshuo Li, Majid Mirmehdi, Tilo Burghardt

TL;DR

This work addresses the limitations of vision-only animal re-identification by introducing dermatoglyphic ACE textual descriptors to encode coat-pattern topology. It couples a robust text-based ACE encoding with a visual-textual co-synthesis pipeline to generate large-scale, biologically grounded synthetic data and trains a cross-modal retrieval system. Key findings include near-perfect text-only Re-ID (≈99.8%), substantial gains from synthetic data for text-to-image retrieval, and robustness improvements via anchor permutation. The approach advances explainable, language-guided animal biometrics with practical implications for ecological monitoring and data-efficient Re-ID across modalities.

Abstract

Biologists have long combined visuals with textual field notes to re-identify (Re-ID) animals. Contemporary AI tools automate this for species with distinctive morphological features but remain largely image-based. Here, we extend Re-ID methodologies by incorporating precise dermatoglyphic textual descriptors-an approach used in forensics but new to ecology. We demonstrate that these specialist semantics abstract and encode animal coat topology using human-interpretable language tags. Drawing on 84,264 manually labelled minutiae across 3,355 images of 185 tigers (Panthera tigris), we evaluate this visual-textual methodology, revealing novel capabilities for cross-modal identity retrieval. To optimise performance, we developed a text-image co-synthesis pipeline to generate 'virtual individuals', each comprising dozens of life-like visuals paired with dermatoglyphic text. Benchmarking against real-world scenarios shows this augmentation significantly boosts AI accuracy in cross-modal retrieval while alleviating data scarcity. We conclude that dermatoglyphic language-guided biometrics can overcome vision-only limitations, enabling textual-to-visual identity recovery underpinned by human-verifiable matchings. This represents a significant advance towards explainability in Re-ID and a language-driven unification of descriptive modalities in ecological monitoring.

Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris

TL;DR

This work addresses the limitations of vision-only animal re-identification by introducing dermatoglyphic ACE textual descriptors to encode coat-pattern topology. It couples a robust text-based ACE encoding with a visual-textual co-synthesis pipeline to generate large-scale, biologically grounded synthetic data and trains a cross-modal retrieval system. Key findings include near-perfect text-only Re-ID (≈99.8%), substantial gains from synthetic data for text-to-image retrieval, and robustness improvements via anchor permutation. The approach advances explainable, language-guided animal biometrics with practical implications for ecological monitoring and data-efficient Re-ID across modalities.

Abstract

Biologists have long combined visuals with textual field notes to re-identify (Re-ID) animals. Contemporary AI tools automate this for species with distinctive morphological features but remain largely image-based. Here, we extend Re-ID methodologies by incorporating precise dermatoglyphic textual descriptors-an approach used in forensics but new to ecology. We demonstrate that these specialist semantics abstract and encode animal coat topology using human-interpretable language tags. Drawing on 84,264 manually labelled minutiae across 3,355 images of 185 tigers (Panthera tigris), we evaluate this visual-textual methodology, revealing novel capabilities for cross-modal identity retrieval. To optimise performance, we developed a text-image co-synthesis pipeline to generate 'virtual individuals', each comprising dozens of life-like visuals paired with dermatoglyphic text. Benchmarking against real-world scenarios shows this augmentation significantly boosts AI accuracy in cross-modal retrieval while alleviating data scarcity. We conclude that dermatoglyphic language-guided biometrics can overcome vision-only limitations, enabling textual-to-visual identity recovery underpinned by human-verifiable matchings. This represents a significant advance towards explainability in Re-ID and a language-driven unification of descriptive modalities in ecological monitoring.

Paper Structure

This paper contains 19 sections, 13 equations, 10 figures.

Figures (10)

  • Figure 1: From Visual Minutiae Features to Dermatoglyphic Text Descriptions. Topological definitions of four common dermatoglyphic structural details (i.e. minutiae) are shown at the bottom left, with examples of these precise structures within a fingerprint to their right, as routinely determined during forensic analysis. Above to the left, corresponding stripe arrangements are displayed on a synthesised tiger coat pattern. Following arrows to the right, the principles of the fingerprint ACE process stevenage2016factneedham2022collaborative are illustrated. Minutiae are identified along anatomically informed scan paths, sequentially encoded, and transformed into structured textual form. The resulting text is precisely interpretable, manually checkable against the visual, and at the same time compactly captures the individuality of the encoded pattern. This enables both 'white box' matching and compact descriptor assembly across pattern categories beyond the scope of traditional computer vision-only models.
  • Figure 2: Visual-textual Co-Synthesis of Virtual Animal Coats. Utilising spatial statistics of minutiae types is real anatomical pattern distributions across a population, we assemble full stripe textures based on a large minutiae library. The latter is constructed through keypoint augmentation, from which region-specific instances are sampled. ACE descriptors and language descriptions are paired with their expression in full-stripe textures to, once rendered, represent a virtual individual multi-modally. This forms the basis for producing any number of virtually rendered animal individuals simulating camera trap scenarios in order to produce realistic imagery for cross-modal AI training.
  • Figure 3: Distortion-corrected Texture Mapping and 3D Pose Modelling.(a): Texture distortion caused by non-linear UV projections is corrected using RBF approximation, preserving anatomical consistency of the pelage pattern. (b): A controllable skeletal system with region-specific bindings enables natural 3D pose modelling by adjusting joints according to camera trap references. Examples of resulting animal poses that are realistically observable in real-world camera trap settings are shown on the right.
  • Figure 4: Biomimetic Pelage Synthesis. Anatomically-driven guides on the model surface are segmented by hair length (short, medium, long) across anatomical regions; fur is generated along these guides, with pelage shapes driven by orientation and length parameters, and physical simulations achieved through perturbation, clumping, and frizz. The resulting fur textures (right) have high fidelity and reduce any semantic gap to real population photography otherwise impeding on network training.
  • Figure 5: Visual Synthesis of 24,000 Camera Trap Images of Virtual Tiger Identities. (a): Where current model-free AI synthesis would struggle to generate virtual individuals with truly matching image-text pairs, our automated virtual image construction pipeline produces pattern-consistent visualisations of life-like (virtual) tigers in camera trap scenarios. Virtual cameras capture 3D animal meshes with distinct coat pattern identity from multiple, task-specific angles under realistic HDRI lighting, with backgrounds from real camera trap images, and are post-processed via harmonisation and augmentation, mimicking field conditions to reduce the semantic gap to real-world imagery. (b): 24 samples within resolution range 301x156 to 413x195 from the 24,000 resulting multi-view animal renderings used for system training labelled with corresponding virtual animal IDs and exemplifying variations in real-world illumination and viewpoints akin to camera trapping. Note that identities are assigned -- as in field protocols in use li2020atrw -- to single sides (left or right) of the tiger. The reader may confirm virtual identities themselves to experience the difficulty of Re-ID from images for the species at hand.
  • ...and 5 more figures