DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery

Zan Wang; Siyu Chen; Luya Mo; Xinfeng Gao; Yuxin Shen; Lebin Ding; Wei Liang

DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery

Zan Wang, Siyu Chen, Luya Mo, Xinfeng Gao, Yuxin Shen, Lebin Ding, Wei Liang

TL;DR

DogMo introduces a large-scale, multi-view RGB-D dataset for 4D canine motion recovery, featuring 1.2k motion sequences from 10 dogs across 11 actions and 5 synchronized cameras. The authors propose a three-stage instance-specific optimization that fits the D-SMAL model to multi-view and RGB-D inputs, leveraging losses such as Chamfer Mask, CSE Keypoints, Leg Cross, and temporal regularization to progressively refine shape, pose, and motion. Four benchmark settings spanning monocular/multi-view and RGB/RGB-D inputs enable systematic evaluation of dog motion recovery, and experiments show improvements over baselines like SMALify and AnimalAvatar, with notable gains when depth and multiple views are available. The work highlights the potential of combining robust 3D priors, dense correspondences, and temporal consistency to enable accurate, plausible 4D dog motion reconstruction, with implications for animation, VR/AR, and animal behavior modeling.

Abstract

We present DogMo, a large-scale multi-view RGB-D video dataset capturing diverse canine movements for the task of motion recovery from images. DogMo comprises 1.2k motion sequences collected from 10 unique dogs, offering rich variation in both motion and breed. It addresses key limitations of existing dog motion datasets, including the lack of multi-view and real 3D data, as well as limited scale and diversity. Leveraging DogMo, we establish four motion recovery benchmark settings that support systematic evaluation across monocular and multi-view, RGB and RGB-D inputs. To facilitate accurate motion recovery, we further introduce a three-stage, instance-specific optimization pipeline that fits the SMAL model to the motion sequences. Our method progressively refines body shape and pose through coarse alignment, dense correspondence supervision, and temporal regularization. Our dataset and method provide a principled foundation for advancing research in dog motion recovery and open up new directions at the intersection of computer vision, computer graphics, and animal behavior modeling.

DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery

TL;DR

Abstract

DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)