Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography
Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos
TL;DR
The paper tackles the challenge of producing immersive user‑generated cinematic content by explicitly coupling camera motion to the actor across three perceptual dimensions: aesthetics, spatial action, and emotion. It introduces a two‑stage framework: a self‑supervised aesthetic adjustor, which refines the initial camera placement via camera projection and rule‑of‑thirds constraints, and a GAN‑based trajectory synthesizer that maps actor kinematics and emotion into camera motion, guided by trajectory and adversarial losses. Key contributions include the Ro‑ thirds based composition module with a dedicated loss $\ ext{L}_{aes}$, a fine‑grained encoder–decoder generator with saliency‑guided tracking, and a shape‑regularized, emotion‑conditioned trajectory generation that yields high immersion across spatial, emotional, and aesthetic axes. The framework is validated in a Unity3D environment with substantial datasets and ablations, showing superior spatial tracking, emotion responsiveness, and aesthetic framing compared to baselines, with live demonstrations in supplementary materials. This approach enables robust, user‑driven, immersive cinematography for virtual productions and UGC contexts, with potential extensions to lighting and camera intrinsics to further enhance realism.
Abstract
User-generated cinematic creations are gaining popularity as our daily entertainment, yet it is a challenge to master cinematography for producing immersive contents. Many existing automatic methods focus on roughly controlling predefined shot types or movement patterns, which struggle to engage viewers with the circumstances of the actor. Real-world cinematographic rules show that directors can create immersion by comprehensively synchronizing the camera with the actor. Inspired by this strategy, we propose a deep camera control framework that enables actor-camera synchronization in three aspects, considering frame aesthetics, spatial action, and emotional status in the 3D virtual stage. Following rule-of-thirds, our framework first modifies the initial camera placement to position the actor aesthetically. This adjustment is facilitated by a self-supervised adjustor that analyzes frame composition via camera projection. We then design a GAN model that can adversarially synthesize fine-grained camera movement based on the physical action and psychological state of the actor, using an encoder-decoder generator to map kinematics and emotional variables into camera trajectories. Moreover, we incorporate a regularizer to align the generated stylistic variances with specific emotional categories and intensities. The experimental results show that our proposed method yields immersive cinematic videos of high quality, both quantitatively and qualitatively. Live examples can be found in the supplementary video.
