Table of Contents
Fetching ...

Time-Archival Camera Virtualization for Sports and Visual Performances

Yunxiao Zhang, William Stone, Suryansh Kumar

TL;DR

By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, this method performs neural representation learning, providing enhanced visual rendering quality at test time.

Abstract

Camera virtualization -- an emerging solution to novel view synthesis -- holds transformative potential for visual entertainment, live performances, and sports broadcasting by enabling the generation of photorealistic images from novel viewpoints using images from a limited set of calibrated multiple static physical cameras. Despite recent advances, achieving spatially and temporally coherent and photorealistic rendering of dynamic scenes with efficient time-archival capabilities, particularly in fast-paced sports and stage performances, remains challenging for existing approaches. Recent methods based on 3D Gaussian Splatting (3DGS) for dynamic scenes could offer real-time view-synthesis results. Yet, they are hindered by their dependence on accurate 3D point clouds from the structure-from-motion method and their inability to handle large, non-rigid, rapid motions of different subjects (e.g., flips, jumps, articulations, sudden player-to-player transitions). Moreover, independent motions of multiple subjects can break the Gaussian-tracking assumptions commonly used in 4DGS, ST-GS, and other dynamic splatting variants. This paper advocates reconsidering a neural volume rendering formulation for camera virtualization and efficient time-archival capabilities, making it useful for sports broadcasting and related applications. By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, our method performs neural representation learning, providing enhanced visual rendering quality at test time. A key contribution of our approach is its support for time-archival, i.e., users can revisit any past temporal instance of a dynamic scene and can perform novel view synthesis, enabling retrospective rendering for replay, analysis, and archival of live events, a functionality absent in existing neural rendering approaches and novel view synthesis...

Time-Archival Camera Virtualization for Sports and Visual Performances

TL;DR

By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, this method performs neural representation learning, providing enhanced visual rendering quality at test time.

Abstract

Camera virtualization -- an emerging solution to novel view synthesis -- holds transformative potential for visual entertainment, live performances, and sports broadcasting by enabling the generation of photorealistic images from novel viewpoints using images from a limited set of calibrated multiple static physical cameras. Despite recent advances, achieving spatially and temporally coherent and photorealistic rendering of dynamic scenes with efficient time-archival capabilities, particularly in fast-paced sports and stage performances, remains challenging for existing approaches. Recent methods based on 3D Gaussian Splatting (3DGS) for dynamic scenes could offer real-time view-synthesis results. Yet, they are hindered by their dependence on accurate 3D point clouds from the structure-from-motion method and their inability to handle large, non-rigid, rapid motions of different subjects (e.g., flips, jumps, articulations, sudden player-to-player transitions). Moreover, independent motions of multiple subjects can break the Gaussian-tracking assumptions commonly used in 4DGS, ST-GS, and other dynamic splatting variants. This paper advocates reconsidering a neural volume rendering formulation for camera virtualization and efficient time-archival capabilities, making it useful for sports broadcasting and related applications. By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, our method performs neural representation learning, providing enhanced visual rendering quality at test time. A key contribution of our approach is its support for time-archival, i.e., users can revisit any past temporal instance of a dynamic scene and can perform novel view synthesis, enabling retrospective rendering for replay, analysis, and archival of live events, a functionality absent in existing neural rendering approaches and novel view synthesis...
Paper Structure (20 sections, 15 equations, 10 figures, 6 tables)

This paper contains 20 sections, 15 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: (a)-(d) Camera virtualization for football sports showing the image rendering from camera placed at different distances from the subject(s), i.e., (a) far-distance viewpoint, (c) near-distance viewpoint (d) top viewpoint.
  • Figure 2: Quantitative comparison of the image rendering quality on the proposed dynamic scene dataset with state.
  • Figure 3: Overall setup for our application. A typical multiview synchronized camera setup installed in a sports scene for broadcasting (shown in black). The virtual cameras to inspect the scene are shown in red color for time-archival or broadcast from users interest view points. a)-b)Top-row: A typical football sports scene and image-rendering from virtual cameras. c)-d)Bottom-row: A typical tennis sports match scene and respective image-rendering from virtual camera view points.
  • Figure 4: Visual Performance Qualitative Comparison Results with 4DGS approach wu20244d on our synthetic multiview dataset. Left: The four camera frustum highlighted in red shows the virtual cameras that will be used for dynamic scene broadcasting. Right: Our rendered image results from those virtual cameras at a given time as compared to 4D-GS wu20244d approach. We also provide the PSNR and LPIPS values for quantitative comparison. Here, VC denotes corresponding virtual camera.
  • Figure 5: Results on CMU Panoptic dataset joo2015panoptic. Left: Multiview Camera setup. Actual cameras are shown in black, where as virtual cameras are highlighted with red. Right: Results using our approach on a couple of challenging sports sequence. Here, VC denotes virtual camera.
  • ...and 5 more figures