Real-Time Human Frontal View Synthesis from a Single Image

Fangyu Lin; Yingdong Hu; Lunjie Zhu; Zhening Liu; Yushi Huang; Zehong Lin; Jun Zhang

Real-Time Human Frontal View Synthesis from a Single Image

Fangyu Lin, Yingdong Hu, Lunjie Zhu, Zhening Liu, Yushi Huang, Zehong Lin, Jun Zhang

Abstract

Photorealistic human novel view synthesis from a single image is crucial for democratizing immersive 3D telepresence, eliminating the need for complex multi-camera setups. However, current rendering-centric methods prioritize visual fidelity over explicit geometric understanding and struggle with intricate regions like faces and hands, leading to temporal instability. Meanwhile, human-centric frameworks suffer from memory bottlenecks since they typically rely on an auxiliary model to provide informative structural priors for geometric modeling, which limits real-time performance. To address these challenges, we propose PrismMirror, a geometry-guided framework for instant frontal view synthesis from a single image. By avoiding external geometric modeling and focusing on frontal view synthesis, our model optimizes visual integrity for telepresence. Specifically, PrismMirror introduces a novel cascade learning strategy that enables coarse-to-fine geometric feature learning. It first directly learns coarse geometric features, such as SMPL-X meshes and point clouds, and then refines textures through rendering supervision. To achieve real-time efficiency, we distill this unified framework into a lightweight linear attention model. Notably, PrismMirror is the first monocular human frontal view synthesis model that achieves real-time inference at 24 FPS, significantly outperforming previous methods in both visual authenticity and structural accuracy.

Real-Time Human Frontal View Synthesis from a Single Image

Abstract

Paper Structure (23 sections, 15 equations, 13 figures, 4 tables)

This paper contains 23 sections, 15 equations, 13 figures, 4 tables.

Introduction
Related Works
Human Novel View Synthesis
Feed-forward NVS and 3D Reconstruction
Human Mesh Reconstruction
Method
Preliminary
Cascaded Model Architecture
Learning Objective
Model Distillation
Experiment
Experimental Setup
Comparison
Ablation Study
Conclusion
...and 8 more sections

Figures (13)

Figure 1: Real-time high-fidelity human frontal view synthesis. PrismMirror achieves a $5\times$ inference speedup ($\sim$42 ms vs. 206.4 ms) over HumanRAM, while maintaining superior visual quality and reaching real-time frame rates of 24+ FPS.
Figure 2: Overview of the PrismMirror architecture. The framework operates through three cascaded stages: encoding global context, injecting explicit geometric priors (SMPL-X and point clouds), and decoding into NVS or 3DGS (accelerated by a progressive linear attention distillation strategy).
Figure 3: Visualization of geometry feature injection. By generating explicit spatial priors (point clouds and SMPL-X) inside the model, our architecture effectively anchors pure data-driven features to capture precise body topology and high-frequency details.
Figure 4: Progressive distillation.
Figure 5: Qualitative comparisons on THuman2.1 and THumanSit. PrismMirror synthesizes sharper details in high-frequency regions (e.g., faces and hands) compared to baselines, successfully avoiding severe floating artifacts.
...and 8 more figures

Real-Time Human Frontal View Synthesis from a Single Image

Abstract

Real-Time Human Frontal View Synthesis from a Single Image

Authors

Abstract

Table of Contents

Figures (13)