FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

Weijie Lyu; Ming-Hsuan Yang; Zhixin Shu

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu

TL;DR

A face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors is proposed, and FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input is introduced.

Abstract

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on portrait videos due to scale-ambiguous camera representations or 3D reconstruction errors. To overcome these limitations, we propose a face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors. We train a video generation model on both multi-view studio captures and in-the-wild monocular videos, and introduce two camera-control data generation strategies: synthetic camera motion and multi-shot stitching, to exploit stationary training cameras while generalizing to dynamic, continuous camera trajectories at inference time. Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

TL;DR

Abstract

Paper Structure (28 sections, 16 equations, 11 figures, 4 tables, 3 algorithms)

This paper contains 28 sections, 16 equations, 11 figures, 4 tables, 3 algorithms.

Introduction
Related Work
Human Face View Synthesis
Camera-Control Video Generation
Method
Problem Setup
Camera Representation via Correspondences
Scale-Aware Camera Conditioning
Training Data Generation
Inference Pipeline
Experiments
Experimental Setup
Experiments on Ava-256
Experiments on In-the-wild Portrait Videos
Conclusion
...and 13 more sections

Figures (11)

Figure 1: FaceCam generates portrait videos with precise camera control from a single input video and a target camera trajectory. We introduce scale-aware camera conditioning that represents the target camera via rendered facial landmarks, enabling accurate camera pose control. Our approach preserves subject identity and motion while maintaining high visual quality. Project page: https://weijielyu.github.io/FaceCam.
Figure 2: Camera representation comparison. We contrast (A) parameter-based representations, which are standard in camera control methods, with (B) image-space point correspondences, which we adopt in FaceCam to obtain a scale-aware conditioning that enables precise camera control.
Figure 3: Training and inference pipeline of FaceCam.
Figure 4: Training data generation examples. The source video is applied with scale and color augmentation to increase data diversity, while the target video is augmented with all three types to train the model’s camera control capability.
Figure 5: Qualitative results on Ava-256.FaceCam produces more realistic, ground-truth-aligned novel views than baselines. ReCamMaster recammaster often fails under large pose changes, pushing the head out of frame, while TrajectoryCrafter trajectorycrafter frequently shows facial distortions from dynamic point-cloud errors.
...and 6 more figures

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

TL;DR

Abstract

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)