PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng, Liwei Liao, Xufeng Li, Jianbo Jiao, Rongjie Wang, Feng Gao, Shiqi Wang, Ronggang Wang
TL;DR
PKU-DyMVHumans addresses the need for high-fidelity dynamic human data to advance reconstruction and photo-realistic rendering. It introduces a dense multi-view dataset with 32 subjects, 45 dynamic scenarios, and 8.2 million frames captured by 56–60 cameras, plus a unified benchmark framework for NeRF-based methods to optimize metrics such as $PSNR$, $SSIM$, and $LPIPS$. The paper benchmarks novel view synthesis, dynamic human modeling, and neural scene decomposition, revealing strengths of hash-encoded NeRFs and challenges from loose clothing, complex motions, and multi-person interactions. The dataset and benchmark provide a practical resource for developing robust dynamic human representations and guide future improvements in multi-view capture and neural rendering.
Abstract
High-quality human reconstruction and photo-realistic rendering of a dynamic scene is a long-standing problem in computer vision and graphics. Despite considerable efforts invested in developing various capture systems and reconstruction algorithms, recent advancements still struggle with loose or oversized clothing and overly complex poses. In part, this is due to the challenges of acquiring high-quality human datasets. To facilitate the development of these fields, in this paper, we present PKU-DyMVHumans, a versatile human-centric dataset for high-fidelity reconstruction and rendering of dynamic human scenarios from dense multi-view videos. It comprises 8.2 million frames captured by more than 56 synchronized cameras across diverse scenarios. These sequences comprise 32 human subjects across 45 different scenarios, each with a high-detailed appearance and realistic human motion. Inspired by recent advancements in neural radiance field (NeRF)-based scene representations, we carefully set up an off-the-shelf framework that is easy to provide those state-of-the-art NeRF-based implementations and benchmark on PKU-DyMVHumans dataset. It is paving the way for various applications like fine-grained foreground/background decomposition, high-quality human reconstruction and photo-realistic novel view synthesis of a dynamic scene. Extensive studies are performed on the benchmark, demonstrating new observations and challenges that emerge from using such high-fidelity dynamic data.
