Table of Contents
Fetching ...

Towards Texture- And Shape-Independent 3D Keypoint Estimation in Birds

Valentin Schmuker, Alex Hoi Hang Chan, Bastian Goldluecke, Urs Waldmann

TL;DR

This work introduces a texture-independent extension of the 3D-MuPPET framework to estimate and track 3D bird poses using silhouettes. By replacing texture-based inputs with silhouette-driven 2D keypoints sourced from DeepLabCut trained on segmentation masks (via SAM prompts) and performing multi-view triangulation, the method handles appearance variations while retaining multi-animal tracking for up to $10$ birds. Quantitative results show the texture-independent approach achieves competitive 3D pose accuracy relative to the texture-dependent baseline and demonstrates promising preliminary generalization to other bird species, including ravens. The study also analyzes ablations and outlines limitations and directions for future improvements, including segmentation quality, temporal coherence, and broader species transfer.

Abstract

In this paper, we present a texture-independent approach to estimate and track 3D joint positions of multiple pigeons. For this purpose, we build upon the existing 3D-MuPPET framework, which estimates and tracks the 3D poses of up to 10 pigeons using a multi-view camera setup. We extend this framework by using a segmentation method that generates silhouettes of the individuals, which are then used to estimate 2D keypoints. Following 3D-MuPPET, these 2D keypoints are triangulated to infer 3D poses, and identities are matched in the first frame and tracked in 2D across subsequent frames. Our proposed texture-independent approach achieves comparable accuracy to the original texture-dependent 3D-MuPPET framework. Additionally, we explore our approach's applicability to other bird species. To do that, we infer the 2D joint positions of four bird species without additional fine-tuning the model trained on pigeons and obtain preliminary promising results. Thus, we think that our approach serves as a solid foundation and inspires the development of more robust and accurate texture-independent pose estimation frameworks.

Towards Texture- And Shape-Independent 3D Keypoint Estimation in Birds

TL;DR

This work introduces a texture-independent extension of the 3D-MuPPET framework to estimate and track 3D bird poses using silhouettes. By replacing texture-based inputs with silhouette-driven 2D keypoints sourced from DeepLabCut trained on segmentation masks (via SAM prompts) and performing multi-view triangulation, the method handles appearance variations while retaining multi-animal tracking for up to birds. Quantitative results show the texture-independent approach achieves competitive 3D pose accuracy relative to the texture-dependent baseline and demonstrates promising preliminary generalization to other bird species, including ravens. The study also analyzes ablations and outlines limitations and directions for future improvements, including segmentation quality, temporal coherence, and broader species transfer.

Abstract

In this paper, we present a texture-independent approach to estimate and track 3D joint positions of multiple pigeons. For this purpose, we build upon the existing 3D-MuPPET framework, which estimates and tracks the 3D poses of up to 10 pigeons using a multi-view camera setup. We extend this framework by using a segmentation method that generates silhouettes of the individuals, which are then used to estimate 2D keypoints. Following 3D-MuPPET, these 2D keypoints are triangulated to infer 3D poses, and identities are matched in the first frame and tracked in 2D across subsequent frames. Our proposed texture-independent approach achieves comparable accuracy to the original texture-dependent 3D-MuPPET framework. Additionally, we explore our approach's applicability to other bird species. To do that, we infer the 2D joint positions of four bird species without additional fine-tuning the model trained on pigeons and obtain preliminary promising results. Thus, we think that our approach serves as a solid foundation and inspires the development of more robust and accurate texture-independent pose estimation frameworks.

Paper Structure

This paper contains 12 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Qualitative results of bird poses using our texture-independent approach. Top: Multi-pigeon pose estimation and tracking in 3D, projected to 2D. Green lines connect the body, blue lines the head keypoints. The detected bounding boxes are red. Example frame from 3D-POP Naik:2023. Bottom left and right: Species transfer to turtle doves and ravens, respectively. Detected bounding box in red. Both examples from Ng:2022.
  • Figure 2: Texture independence. Two examples of our mask generation from textured images (left) using DLCSAM (middle) and DLCISO (right), cf. \ref{['sec:framework:3d-muppet']}. Example frames from Naik:2023.