Table of Contents
Fetching ...

Temporally-consistent 3D Reconstruction of Birds

Johannes Hägerlind, Jonas Hentati-Sundberg, Bastian Wandt

TL;DR

This work tackles reconstructing 3D pose and shape of common murres from monocular video, addressing rapid, non-rigid bird motion. It introduces a modular pipeline—detection, segmentation, tracking, and temporally consistent 3D fitting of a parametric bird model—augmented with a temporal loss to enforce smooth motion. A real-world Baltic Seabird Project dataset with 10K frames (roughly nine birds per frame) and a smaller labeled test set enables assessment of both 2D and 3D consistency. Results show that temporal optimization improves 2D keypoint reprojection errors and yields more plausible motion, achieving state-of-the-art performance on challenging sequences in the dataset.

Abstract

This paper deals with 3D reconstruction of seabirds which recently came into focus of environmental scientists as valuable bio-indicators for environmental change. Such 3D information is beneficial for analyzing the bird's behavior and physiological shape, for example by tracking motion, shape, and appearance changes. From a computer vision perspective birds are especially challenging due to their rapid and oftentimes non-rigid motions. We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre. Our approach comprises a full pipeline of detection, tracking, segmentation, and temporally consistent 3D reconstruction. Additionally, we propose a temporal loss that extends current single-image 3D bird pose estimators to the temporal domain. Moreover, we provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously, comprising a large variety of motions and interactions, including a smaller test set with bird-specific keypoint labels. Using our temporal optimization, we achieve state-of-the-art performance for the challenging sequences in our dataset.

Temporally-consistent 3D Reconstruction of Birds

TL;DR

This work tackles reconstructing 3D pose and shape of common murres from monocular video, addressing rapid, non-rigid bird motion. It introduces a modular pipeline—detection, segmentation, tracking, and temporally consistent 3D fitting of a parametric bird model—augmented with a temporal loss to enforce smooth motion. A real-world Baltic Seabird Project dataset with 10K frames (roughly nine birds per frame) and a smaller labeled test set enables assessment of both 2D and 3D consistency. Results show that temporal optimization improves 2D keypoint reprojection errors and yields more plausible motion, achieving state-of-the-art performance on challenging sequences in the dataset.

Abstract

This paper deals with 3D reconstruction of seabirds which recently came into focus of environmental scientists as valuable bio-indicators for environmental change. Such 3D information is beneficial for analyzing the bird's behavior and physiological shape, for example by tracking motion, shape, and appearance changes. From a computer vision perspective birds are especially challenging due to their rapid and oftentimes non-rigid motions. We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre. Our approach comprises a full pipeline of detection, tracking, segmentation, and temporally consistent 3D reconstruction. Additionally, we propose a temporal loss that extends current single-image 3D bird pose estimators to the temporal domain. Moreover, we provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously, comprising a large variety of motions and interactions, including a smaller test set with bird-specific keypoint labels. Using our temporal optimization, we achieve state-of-the-art performance for the challenging sequences in our dataset.
Paper Structure (13 sections, 6 equations, 2 figures, 1 table)

This paper contains 13 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The proposed pipeline. The pink box represents learning the 3D pose prior wang2021birds, the blue boxes introduce the fitting the parameterized model to the 3D fitting and the prediction of segmentation masks inspired hagerlind20233d, the orange boxes additional improvements that were made in the current work, and the green boxes show the integration of temporal information which is the main contribution of this work.
  • Figure 2: Example reconstruction. The odd rows show the input image. The even rows show the corresponding mesh for the tracked bird rendered on top of the background image. The texture of the reconstructed bird is only added for visualization purposes.