Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement
Chia-Hern Lai, I-Hsuan Lo, Yen-Ku Yeh, Thanh-Nguyen Truong, Ching-Chun Huang
TL;DR
Blur2Sharp tackles the challenge of photorealistic, geometrically consistent human avatar synthesis under novel poses and views from a single image. It fuses a generalizable Human NeRF with a diffusion-based generative prior, guided by multi-layer SMPL priors (texture, normal, semantic) and a reference-knowledge transfer mechanism. The method introduces a dual-domain RGB-Normal diffusion model and a Multi-Layer Geometry Fusion module to balance global structure with fine detail, enabling sharp, view-consistent outputs even for loose clothing and occlusions. Across two large datasets, Blur2Sharp achieves state-of-the-art performance on both novel pose and novel view synthesis, with strong generalization and informative ablations supporting the effectiveness of its components.
Abstract
The creation of lifelike human avatars capable of realistic pose variation and viewpoint flexibility remains a fundamental challenge in computer vision and graphics. Current approaches typically yield either geometrically inconsistent multi-view images or sacrifice photorealism, resulting in blurry outputs under diverse viewing angles and complex motions. To address these issues, we propose Blur2Sharp, a novel framework integrating 3D-aware neural rendering and diffusion models to generate sharp, geometrically consistent novel-view images from only a single reference view. Our method employs a dual-conditioning architecture: initially, a Human NeRF model generates geometrically coherent multi-view renderings for target poses, explicitly encoding 3D structural guidance. Subsequently, a diffusion model conditioned on these renderings refines the generated images, preserving fine-grained details and structural fidelity. We further enhance visual quality through hierarchical feature fusion, incorporating texture, normal, and semantic priors extracted from parametric SMPL models to simultaneously improve global coherence and local detail accuracy. Extensive experiments demonstrate that Blur2Sharp consistently surpasses state-of-the-art techniques in both novel pose and view generation tasks, particularly excelling under challenging scenarios involving loose clothing and occlusions.
