Table of Contents
Fetching ...

Sophia-in-Audition: Virtual Production with a Robot Performer

Taotao Zhou, Teng Xu, Dong Zhang, Yuyang Jiao, Peijun Xu, Yaoyu He, Lan Xu, Jingyi Yu

TL;DR

This work introduces Sophia-in-Audition (SiA), a novel virtual production framework that places the humanoid robot Sophia inside an UltraStage with a controllable lighting dome and a 32-camera rig to simultaneously manage performance, lighting, and camera movement. It develops facial motion transfer and identity augmentation pipelines, including ARKit-based mapping, diffusion-based appearance enhancement, and 3D Gaussian Splatting for novel-view rendering, yielding a first-of-its-kind multi-view robot performance dataset with dynamic lighting. The authors demonstrate replication of iconic film scenes, dynamic lighting, and virtual camera moves, supported by a user study showing generally positive audience reception and a perceived reduction in uncanny valley effects, along with realistic lighting that approaches professional film standards. While acknowledging limitations in realism and motion fluidity, the work argues that SiA offers a practical, cost-effective audition and pre-visualization platform and paves the way for broader adoption of humanoid robots in virtual production and related research.

Abstract

We present Sophia-in-Audition (SiA), a new frontier in virtual production, by employing the humanoid robot Sophia within an UltraStage environment composed of a controllable lighting dome coupled with multiple cameras. We demonstrate Sophia's capability to replicate iconic film segments, follow real performers, and perform a variety of motions and expressions, showcasing her versatility as a virtual actor. Key to this process is the integration of facial motion transfer algorithms and the UltraStage's controllable lighting and multi-camera setup, enabling dynamic performances that align with the director's vision. Our comprehensive user studies indicate positive audience reception towards Sophia's performances, highlighting her potential to reduce the uncanny valley effect in virtual acting. Additionally, the immersive lighting in dynamic clips was highly rated for its naturalness and its ability to mirror professional film standards. The paper presents a first-of-its-kind multi-view robot performance video dataset with dynamic lighting, offering valuable insights for future enhancements in humanoid robotic performers and virtual production techniques. This research contributes significantly to the field by presenting a unique virtual production setup, developing tools for sophisticated performance control, and providing a comprehensive dataset and user study analysis for diverse applications.

Sophia-in-Audition: Virtual Production with a Robot Performer

TL;DR

This work introduces Sophia-in-Audition (SiA), a novel virtual production framework that places the humanoid robot Sophia inside an UltraStage with a controllable lighting dome and a 32-camera rig to simultaneously manage performance, lighting, and camera movement. It develops facial motion transfer and identity augmentation pipelines, including ARKit-based mapping, diffusion-based appearance enhancement, and 3D Gaussian Splatting for novel-view rendering, yielding a first-of-its-kind multi-view robot performance dataset with dynamic lighting. The authors demonstrate replication of iconic film scenes, dynamic lighting, and virtual camera moves, supported by a user study showing generally positive audience reception and a perceived reduction in uncanny valley effects, along with realistic lighting that approaches professional film standards. While acknowledging limitations in realism and motion fluidity, the work argues that SiA offers a practical, cost-effective audition and pre-visualization platform and paves the way for broader adoption of humanoid robots in virtual production and related research.

Abstract

We present Sophia-in-Audition (SiA), a new frontier in virtual production, by employing the humanoid robot Sophia within an UltraStage environment composed of a controllable lighting dome coupled with multiple cameras. We demonstrate Sophia's capability to replicate iconic film segments, follow real performers, and perform a variety of motions and expressions, showcasing her versatility as a virtual actor. Key to this process is the integration of facial motion transfer algorithms and the UltraStage's controllable lighting and multi-camera setup, enabling dynamic performances that align with the director's vision. Our comprehensive user studies indicate positive audience reception towards Sophia's performances, highlighting her potential to reduce the uncanny valley effect in virtual acting. Additionally, the immersive lighting in dynamic clips was highly rated for its naturalness and its ability to mirror professional film standards. The paper presents a first-of-its-kind multi-view robot performance video dataset with dynamic lighting, offering valuable insights for future enhancements in humanoid robotic performers and virtual production techniques. This research contributes significantly to the field by presenting a unique virtual production setup, developing tools for sophisticated performance control, and providing a comprehensive dataset and user study analysis for diverse applications.
Paper Structure (27 sections, 15 figures)

This paper contains 27 sections, 15 figures.

Figures (15)

  • Figure 1: Sophia-in-Audition(SiA). We present a new practice of virtual production: we deploy the humanoid robot Sophia as a virtual performer inside a virtual production studio, in our case, an UltraStage composed of a controllable lighting dome analogous to Light Stage coupled with multi-camera video shooting. We call this setup Sophia-in-Audition or SiA which allows for simultaneous controls over performance, lighting, and camera movements.
  • Figure 2: The overview of Sophia-in-Audition(SiA). We deploy the humanoid robot Sophia as a virtual performer inside the UltraStage (Sec. \ref{['sec:sophia-in-audition']}) composed of a controllable lighting dome coupled with multi-camera video shooting. We use SiA to collect a comprehensive dataset with Sophia in immersive environment lighting (Sec. \ref{['sec:dataset']}). We call this setup Sophia-in-Audition or SiA which allows for simultaneous controls over performance, lighting, and camera movements (Sec. \ref{['sec:identity_augmentation']}, \ref{['sec:virtual_lighting_cam_move']}).
  • Figure 3: Virtual Performer Sophia and Her Facial Expression Samples. The brow region features 5 motors allowing for eyebrow lifting, while the eye and eyelid area is equipped with 11 actuators for blinking and eye movement. Additionally, the nose region contains 2 motors, and the mouth area includes 14 motors that control movements of the mouth, tongue, jaw, and surrounding muscles, enabling a wide range of expressive capabilities.
  • Figure 4: Lighting Estimation Pipeline. The estimation begins with an input image undergoing depth estimation and subsequent depth map editing, which, alongside an inpainting mask, feeds into a latent diffusion model with fine-tuned LoRA (FT-LoRA) to generate chrome ball reflections at various exposure values. These are then median combined using HDR bracketing to synthesize a comprehensive HDR map.
  • Figure 5: Lighting Reproduction Pipeline. For lighting reproduction, the pipeline entails a three-stage process within the UltraStage: first, calibrating and projecting light positions onto the HDR map, followed by Voronoi partitioning and weighted sampling to capture light regions and colors. Next, the sampled HDR light map is down-converted to LDR while preserving energy through dilation of overexposed areas. The final stage involves mapping the three-channel light map onto a six-spectrum LED arrangement using a color chart and non-negative least squares method, producing a precise LED light map that is utilized within the UltraStage for accurate environmental lighting emulation.
  • ...and 10 more figures