CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx
Lukas Picek, Elisa Belotti, Michal Bojda, Ludek Bufka, Vojtech Cermak, Martin Dula, Rostislav Dvorak, Luboslav Hrdy, Miroslav Jirik, Vaclav Kocourek, Josefa Krausova, Jirı Labuda, Jakub Straka, Ludek Toman, Vlado Trulık, Martin Vana, Miroslav Kutal
TL;DR
CzechLynx introduces the first large-scale, open-access dataset for Eurasian lynx identification, pose estimation, and segmentation, combining 39,760 real camera-trap images of 319 individuals collected over 15 years from two regions with a scalable Unity-based synthetic data pipeline and 20-point pose annotations. The dataset is built from two field programs, with careful preprocessing and a semi-automated annotation workflow leveraging SAM and AnimalPose, and augmented by a synthetic generation pipeline using diffusion-based textures. It defines geo-aware and time-aware evaluation protocols to stress-test models across cross-regional and long-term monitoring scenarios. The work provides a comprehensive, multimodal benchmark along with open-source tools for data loading and evaluation across three tasks, enabling robust wildlife-monitoring methods and cross-study comparability.
Abstract
We introduce CzechLynx, the first large-scale, open-access dataset for individual identification, pose estimation, and instance segmentation of the Eurasian lynx (Lynx lynx). CzechLynx contains 39,760 camera trap images annotated with segmentation masks, identity labels, and 20-point skeletons and covers 319 unique individuals across 15 years of systematic monitoring in two geographically distinct regions: southwest Bohemia and the Western Carpathians. In addition to the real camera trap data, we provide a large complementary set of photorealistic synthetic images and a Unity-based generation pipeline with diffusion-based text-to-texture modeling, capable of producing arbitrarily large amounts of synthetic data spanning diverse environments, poses, and coat-pattern variations. To enable systematic testing across realistic ecological scenarios, we define three complementary evaluation protocols: (i) geo-aware, (ii) time-aware open-set, and (iii) time-aware closed-set, covering cross-regional and long-term monitoring settings. With the provided resources, CzechLynx offers a unique, flexible benchmark for robust evaluation of computer vision and machine learning models across realistic ecological scenarios.
