Time-Archival Camera Virtualization for Sports and Visual Performances

Yunxiao Zhang; William Stone; Suryansh Kumar

Time-Archival Camera Virtualization for Sports and Visual Performances

Yunxiao Zhang, William Stone, Suryansh Kumar

TL;DR

By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, this method performs neural representation learning, providing enhanced visual rendering quality at test time.

Abstract

Camera virtualization -- an emerging solution to novel view synthesis -- holds transformative potential for visual entertainment, live performances, and sports broadcasting by enabling the generation of photorealistic images from novel viewpoints using images from a limited set of calibrated multiple static physical cameras. Despite recent advances, achieving spatially and temporally coherent and photorealistic rendering of dynamic scenes with efficient time-archival capabilities, particularly in fast-paced sports and stage performances, remains challenging for existing approaches. Recent methods based on 3D Gaussian Splatting (3DGS) for dynamic scenes could offer real-time view-synthesis results. Yet, they are hindered by their dependence on accurate 3D point clouds from the structure-from-motion method and their inability to handle large, non-rigid, rapid motions of different subjects (e.g., flips, jumps, articulations, sudden player-to-player transitions). Moreover, independent motions of multiple subjects can break the Gaussian-tracking assumptions commonly used in 4DGS, ST-GS, and other dynamic splatting variants. This paper advocates reconsidering a neural volume rendering formulation for camera virtualization and efficient time-archival capabilities, making it useful for sports broadcasting and related applications. By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, our method performs neural representation learning, providing enhanced visual rendering quality at test time. A key contribution of our approach is its support for time-archival, i.e., users can revisit any past temporal instance of a dynamic scene and can perform novel view synthesis, enabling retrospective rendering for replay, analysis, and archival of live events, a functionality absent in existing neural rendering approaches and novel view synthesis...

Time-Archival Camera Virtualization for Sports and Visual Performances

TL;DR

Abstract

Paper Structure (20 sections, 15 equations, 10 figures, 6 tables)

This paper contains 20 sections, 15 equations, 10 figures, 6 tables.

Introduction
Related Work
Methodology
Experiment and Results
Dataset for Evaluation
Evaluation and Result
Ablations
Discussion
Limitations
Conclusion
Technical Appendices
Synthetic Multiview Dataset Camera Pose Acquisition via Fibonacci Sphere Sampling
Technical Details on CMU Multiview Dataset Calibration and Coordinate Conversion
Model Training Technical Details
Spatial bounding box.
...and 5 more sections

Figures (10)

Figure 1: (a)-(d) Camera virtualization for football sports showing the image rendering from camera placed at different distances from the subject(s), i.e., (a) far-distance viewpoint, (c) near-distance viewpoint (d) top viewpoint.
Figure 2: Quantitative comparison of the image rendering quality on the proposed dynamic scene dataset with state.
Figure 3: Overall setup for our application. A typical multiview synchronized camera setup installed in a sports scene for broadcasting (shown in black). The virtual cameras to inspect the scene are shown in red color for time-archival or broadcast from users interest view points. a)-b)Top-row: A typical football sports scene and image-rendering from virtual cameras. c)-d)Bottom-row: A typical tennis sports match scene and respective image-rendering from virtual camera view points.
Figure 4: Visual Performance Qualitative Comparison Results with 4DGS approach wu20244d on our synthetic multiview dataset. Left: The four camera frustum highlighted in red shows the virtual cameras that will be used for dynamic scene broadcasting. Right: Our rendered image results from those virtual cameras at a given time as compared to 4D-GS wu20244d approach. We also provide the PSNR and LPIPS values for quantitative comparison. Here, VC denotes corresponding virtual camera.
Figure 5: Results on CMU Panoptic dataset joo2015panoptic. Left: Multiview Camera setup. Actual cameras are shown in black, where as virtual cameras are highlighted with red. Right: Results using our approach on a couple of challenging sports sequence. Here, VC denotes virtual camera.
...and 5 more figures

Time-Archival Camera Virtualization for Sports and Visual Performances

TL;DR

Abstract

Time-Archival Camera Virtualization for Sports and Visual Performances

Authors

TL;DR

Abstract

Table of Contents

Figures (10)