Table of Contents
Fetching ...

Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data

Moira Shooter, Charles Malleson, Adrian Hilton

TL;DR

This work tackles monocular 3D dog pose estimation by introducing two datasets, 3DDogs-Lab and 3DDogs-Wild, to study domain shift between lab mocap data and in-the-wild imagery. It presents a thorough benchmark across multiple pose-estimation models, showing that training on the in-the-wild dataset improves generalization to real-world data and across species, with D-Pose variants (notably with the DINOv2 backbone) often delivering the strongest performance. The datasets and analyses highlight strengths and limitations of current approaches, emphasize the importance of diverse training data for robust RGB-based 3D animal pose estimation, and demonstrate cross-species transfer potential using Animals3D. Overall, the work provides valuable resources and insights to advance markerless, monocular 3D animal pose estimation in practical settings.

Abstract

We introduce a new benchmark analysis focusing on 3D canine pose estimation from monocular in-the-wild images. A multi-modal dataset 3DDogs-Lab was captured indoors, featuring various dog breeds trotting on a walkway. It includes data from optical marker-based mocap systems, RGBD cameras, IMUs, and a pressure mat. While providing high-quality motion data, the presence of optical markers and limited background diversity make the captured video less representative of real-world conditions. To address this, we created 3DDogs-Wild, a naturalised version of the dataset where the optical markers are in-painted and the subjects are placed in diverse environments, enhancing its utility for training RGB image-based pose detectors. We show that using the 3DDogs-Wild to train the models leads to improved performance when evaluating on in-the-wild data. Additionally, we provide a thorough analysis using various pose estimation models, revealing their respective strengths and weaknesses. We believe that our findings, coupled with the datasets provided, offer valuable insights for advancing 3D animal pose estimation.

Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data

TL;DR

This work tackles monocular 3D dog pose estimation by introducing two datasets, 3DDogs-Lab and 3DDogs-Wild, to study domain shift between lab mocap data and in-the-wild imagery. It presents a thorough benchmark across multiple pose-estimation models, showing that training on the in-the-wild dataset improves generalization to real-world data and across species, with D-Pose variants (notably with the DINOv2 backbone) often delivering the strongest performance. The datasets and analyses highlight strengths and limitations of current approaches, emphasize the importance of diverse training data for robust RGB-based 3D animal pose estimation, and demonstrate cross-species transfer potential using Animals3D. Overall, the work provides valuable resources and insights to advance markerless, monocular 3D animal pose estimation in practical settings.

Abstract

We introduce a new benchmark analysis focusing on 3D canine pose estimation from monocular in-the-wild images. A multi-modal dataset 3DDogs-Lab was captured indoors, featuring various dog breeds trotting on a walkway. It includes data from optical marker-based mocap systems, RGBD cameras, IMUs, and a pressure mat. While providing high-quality motion data, the presence of optical markers and limited background diversity make the captured video less representative of real-world conditions. To address this, we created 3DDogs-Wild, a naturalised version of the dataset where the optical markers are in-painted and the subjects are placed in diverse environments, enhancing its utility for training RGB image-based pose detectors. We show that using the 3DDogs-Wild to train the models leads to improved performance when evaluating on in-the-wild data. Additionally, we provide a thorough analysis using various pose estimation models, revealing their respective strengths and weaknesses. We believe that our findings, coupled with the datasets provided, offer valuable insights for advancing 3D animal pose estimation.
Paper Structure (10 sections, 8 figures, 5 tables)

This paper contains 10 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Qualitative results of different pose estimation methods trained on the 3DDogs-Wild dataset.
  • Figure 2: Samples of 3DDogs-Lab showcasing the original motion capture vs in-the-wild version sequences.
  • Figure 3: Qualitative results on samples of the Animals3D Xu_2023_ICCV test set from D-Pose (DINOv2-S) trained on only the 3DDogs-Wild dataset. The pose is viewed from different angles.
  • Figure 4: Background generation artifacts such as extra limb hallucinations and inconsistencies across frames.
  • Figure 5: Demonstrating the keypoint differences between the 3DDogs and Animals3D datasets. Additionally, showcasing samples of D-Pose (DINOv2-S) model's predictions and ground truth of the Animals3D test set.
  • ...and 3 more figures