Temporally-consistent 3D Reconstruction of Birds
Johannes Hägerlind, Jonas Hentati-Sundberg, Bastian Wandt
TL;DR
This work tackles reconstructing 3D pose and shape of common murres from monocular video, addressing rapid, non-rigid bird motion. It introduces a modular pipeline—detection, segmentation, tracking, and temporally consistent 3D fitting of a parametric bird model—augmented with a temporal loss to enforce smooth motion. A real-world Baltic Seabird Project dataset with 10K frames (roughly nine birds per frame) and a smaller labeled test set enables assessment of both 2D and 3D consistency. Results show that temporal optimization improves 2D keypoint reprojection errors and yields more plausible motion, achieving state-of-the-art performance on challenging sequences in the dataset.
Abstract
This paper deals with 3D reconstruction of seabirds which recently came into focus of environmental scientists as valuable bio-indicators for environmental change. Such 3D information is beneficial for analyzing the bird's behavior and physiological shape, for example by tracking motion, shape, and appearance changes. From a computer vision perspective birds are especially challenging due to their rapid and oftentimes non-rigid motions. We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre. Our approach comprises a full pipeline of detection, tracking, segmentation, and temporally consistent 3D reconstruction. Additionally, we propose a temporal loss that extends current single-image 3D bird pose estimators to the temporal domain. Moreover, we provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously, comprising a large variety of motions and interactions, including a smaller test set with bird-specific keypoint labels. Using our temporal optimization, we achieve state-of-the-art performance for the challenging sequences in our dataset.
