HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
Caoyuan Ma, Yu-Lun Liu, Zhixiang Wang, Wu Liu, Xinchen Liu, Zheng Wang
TL;DR
HumanNeRF-SE presents a streamlined architecture that fuses explicit SMPL priors with implicit NeRF to animate humans across diverse poses from monocular, few-shot inputs. By voxelizing SMPL space, applying Conv-Filter to prune irrelevant points, and refining point-wise canonical coordinates with spatial-aware features, the method achieves strong pose generalization while dramatically reducing learnable parameters and training time. The approach delivers high-quality renderings with fewer artifacts than prior methods, and demonstrates notable speedups without external acceleration modules. Its reliance on readily available SMPL information and a simple yet effective design makes it practical for industrial video production and real-time-style applications, especially under limited data regimes.
Abstract
We present HumanNeRF-SE, a simple yet effective method that synthesizes diverse novel pose images with simple input. Previous HumanNeRF works require a large number of optimizable parameters to fit the human images. Instead, we reload these approaches by combining explicit and implicit human representations to design both generalized rigid deformation and specific non-rigid deformation. Our key insight is that explicit shape can reduce the sampling points used to fit implicit representation, and frozen blending weights from SMPL constructing a generalized rigid deformation can effectively avoid overfitting and improve pose generalization performance. Our architecture involving both explicit and implicit representation is simple yet effective. Experiments demonstrate our model can synthesize images under arbitrary poses with few-shot input and increase the speed of synthesizing images by 15 times through a reduction in computational complexity without using any existing acceleration modules. Compared to the state-of-the-art HumanNeRF studies, HumanNeRF-SE achieves better performance with fewer learnable parameters and less training time.
