Sophia-in-Audition: Virtual Production with a Robot Performer
Taotao Zhou, Teng Xu, Dong Zhang, Yuyang Jiao, Peijun Xu, Yaoyu He, Lan Xu, Jingyi Yu
TL;DR
This work introduces Sophia-in-Audition (SiA), a novel virtual production framework that places the humanoid robot Sophia inside an UltraStage with a controllable lighting dome and a 32-camera rig to simultaneously manage performance, lighting, and camera movement. It develops facial motion transfer and identity augmentation pipelines, including ARKit-based mapping, diffusion-based appearance enhancement, and 3D Gaussian Splatting for novel-view rendering, yielding a first-of-its-kind multi-view robot performance dataset with dynamic lighting. The authors demonstrate replication of iconic film scenes, dynamic lighting, and virtual camera moves, supported by a user study showing generally positive audience reception and a perceived reduction in uncanny valley effects, along with realistic lighting that approaches professional film standards. While acknowledging limitations in realism and motion fluidity, the work argues that SiA offers a practical, cost-effective audition and pre-visualization platform and paves the way for broader adoption of humanoid robots in virtual production and related research.
Abstract
We present Sophia-in-Audition (SiA), a new frontier in virtual production, by employing the humanoid robot Sophia within an UltraStage environment composed of a controllable lighting dome coupled with multiple cameras. We demonstrate Sophia's capability to replicate iconic film segments, follow real performers, and perform a variety of motions and expressions, showcasing her versatility as a virtual actor. Key to this process is the integration of facial motion transfer algorithms and the UltraStage's controllable lighting and multi-camera setup, enabling dynamic performances that align with the director's vision. Our comprehensive user studies indicate positive audience reception towards Sophia's performances, highlighting her potential to reduce the uncanny valley effect in virtual acting. Additionally, the immersive lighting in dynamic clips was highly rated for its naturalness and its ability to mirror professional film standards. The paper presents a first-of-its-kind multi-view robot performance video dataset with dynamic lighting, offering valuable insights for future enhancements in humanoid robotic performers and virtual production techniques. This research contributes significantly to the field by presenting a unique virtual production setup, developing tools for sophisticated performance control, and providing a comprehensive dataset and user study analysis for diverse applications.
