Table of Contents
Fetching ...

ExFMan: Rendering 3D Dynamic Humans with Hybrid Monocular Blurry Frames and Events

Kanghao Chen, Zeyu Wang, Lin Wang

TL;DR

ExFMan tackles rendering dynamic humans under motion blur from monocular videos by fusing hybrid RGB frames with asynchronous event data. It introduces a velocity field $\mathbf{v}(\mathbf{x};t)$ in canonical space to locate blur and develops two velocity-based losses—velocity-aware photometric loss and velocity-relative event loss—to jointly supervise RGB and event signals, complemented by pose regularization and velocity-based alpha loss. Across synthetic ZJU-MoCap and real DAVIS346 data, ExFMan achieves sharper renderings and clear boundaries, with a reported 3.06 dB PSNR improvement over baselines on the synthetic dataset. By leveraging event data to complement frame-based cues, the method demonstrates robust dynamic-human reconstruction in uncontrolled scenes and suggests a viable direction for blur-robust NeRF-like rendering of fast human motion.

Abstract

Recent years have witnessed tremendous progress in the 3D reconstruction of dynamic humans from a monocular video with the advent of neural rendering techniques. This task has a wide range of applications, including the creation of virtual characters for virtual reality (VR) environments. However, it is still challenging to reconstruct clear humans when the monocular video is affected by motion blur, particularly caused by rapid human motion (e.g., running, dancing), as often occurs in the wild. This leads to distinct inconsistency of shape and appearance for the rendered 3D humans, especially in the blurry regions with rapid motion, e.g., hands and legs. In this paper, we propose ExFMan, the first neural rendering framework that unveils the possibility of rendering high-quality humans in rapid motion with a hybrid frame-based RGB and bio-inspired event camera. The ``out-of-the-box'' insight is to leverage the high temporal information of event data in a complementary manner and adaptively reweight the effect of losses for both RGB frames and events in the local regions, according to the velocity of the rendered human. This significantly mitigates the inconsistency associated with motion blur in the RGB frames. Specifically, we first formulate a velocity field of the 3D body in the canonical space and render it to image space to identify the body parts with motion blur. We then propose two novel losses, i.e., velocity-aware photometric loss and velocity-relative event loss, to optimize the neural human for both modalities under the guidance of the estimated velocity. In addition, we incorporate novel pose regularization and alpha losses to facilitate continuous pose and clear boundary. Extensive experiments on synthetic and real-world datasets demonstrate that ExFMan can reconstruct sharper and higher quality humans.

ExFMan: Rendering 3D Dynamic Humans with Hybrid Monocular Blurry Frames and Events

TL;DR

ExFMan tackles rendering dynamic humans under motion blur from monocular videos by fusing hybrid RGB frames with asynchronous event data. It introduces a velocity field in canonical space to locate blur and develops two velocity-based losses—velocity-aware photometric loss and velocity-relative event loss—to jointly supervise RGB and event signals, complemented by pose regularization and velocity-based alpha loss. Across synthetic ZJU-MoCap and real DAVIS346 data, ExFMan achieves sharper renderings and clear boundaries, with a reported 3.06 dB PSNR improvement over baselines on the synthetic dataset. By leveraging event data to complement frame-based cues, the method demonstrates robust dynamic-human reconstruction in uncontrolled scenes and suggests a viable direction for blur-robust NeRF-like rendering of fast human motion.

Abstract

Recent years have witnessed tremendous progress in the 3D reconstruction of dynamic humans from a monocular video with the advent of neural rendering techniques. This task has a wide range of applications, including the creation of virtual characters for virtual reality (VR) environments. However, it is still challenging to reconstruct clear humans when the monocular video is affected by motion blur, particularly caused by rapid human motion (e.g., running, dancing), as often occurs in the wild. This leads to distinct inconsistency of shape and appearance for the rendered 3D humans, especially in the blurry regions with rapid motion, e.g., hands and legs. In this paper, we propose ExFMan, the first neural rendering framework that unveils the possibility of rendering high-quality humans in rapid motion with a hybrid frame-based RGB and bio-inspired event camera. The ``out-of-the-box'' insight is to leverage the high temporal information of event data in a complementary manner and adaptively reweight the effect of losses for both RGB frames and events in the local regions, according to the velocity of the rendered human. This significantly mitigates the inconsistency associated with motion blur in the RGB frames. Specifically, we first formulate a velocity field of the 3D body in the canonical space and render it to image space to identify the body parts with motion blur. We then propose two novel losses, i.e., velocity-aware photometric loss and velocity-relative event loss, to optimize the neural human for both modalities under the guidance of the estimated velocity. In addition, we incorporate novel pose regularization and alpha losses to facilitate continuous pose and clear boundary. Extensive experiments on synthetic and real-world datasets demonstrate that ExFMan can reconstruct sharper and higher quality humans.
Paper Structure (17 sections, 14 equations, 8 figures, 2 tables)

This paper contains 17 sections, 14 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Our framework facilitates human reconstruction from blurry frames and event data, based on a velocity field. For a given timestamp $t$ and 3D point $\hbox{\bm{$x$}}$ along the sampled ray $\hbox{\bm{$r$}}$ in the observation space, the deformation mapping deforms $\hbox{\bm{$x$}}$ into its canonical counterpart $\hat{\hbox{\bm{$x$}}}$, according to pose $\hbox{\bm{$p$}}(t)$. The velocity $\hbox{\bm{$v$}}$ of sampled point $\hbox{\bm{$x$}}$ and time $t$ is calculated as the derivative of the deformed point $\hat{\hbox{\bm{$x$}}}$ w.r.t the timestamp $t$, followed by projecting it onto the human surface based on normal vector $\hbox{\bm{$n$}}$ of SMPL model. The velocity map is then obtained through volumetric rendering. Based on this velocity map, we introduce a velocity-aware photometric loss and a velocity-relative event loss, designed to leverage both data modalities in a complementary manner.
  • Figure 2: Left: Comparison of different implementation of velocity field. (b) NeRF-W fails to yield meaningful velocity with implicit uncertainty. (c) The velocity without surface projection highlights the wrong regions with less motion blur. Right: Illustration of the computation of the velocity based on the human model in rapid motions.
  • Figure 3: Qualitative results of novel view synthesis in the ZJU-MoCap dataset peng2021neural.
  • Figure 4: Qualitative results of novel view synthesis in our real-world dataset.
  • Figure 5: Our velocity-aware photometric loss improves rendering quality at the region with motion blur ((a) vs. (b)).
  • ...and 3 more figures