Table of Contents
Fetching ...

Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

Hao-Jen Chien, Yi-Chuan Huang, Chung-Ho Wu, Wei-Lun Chao, Yu-Lun Liu

TL;DR

<3-5 sentence high-level summary> Splannequin tackles freezing monocular Mannequin-Challenge footage by addressing artifacts caused by ill-supervised Gaussian primitives in dynamic Gaussian splatting. It introduces a dual-state regularization that classifies Gaussians as hidden or defective and anchors them to reliable past or future frames, all within an architecture-agnostic, zero-inference-overhead framework. The approach yields substantial perceptual quality improvements and enables real-time, user-controlled freeze-time renderings, demonstrated on real and synthetic data with strong user preference results. This work enables high-fidelity, freeze-time visualizations for consumer videos, VR/AR experiences, and data augmentation for dynamic scene understanding while maintaining efficiency.

Abstract

Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstruction. Instead of focusing on modeling motion, our goal is to create a frozen scene while strategically preserving subtle dynamics to enable user-controlled instant selection. To achieve this, we introduce a novel application of dynamic Gaussian splatting: the scene is modeled dynamically, which retains nearby temporal variation, and a static scene is rendered by fixing the model's time parameter. However, under this usage, monocular capture with sparse temporal supervision introduces artifacts like ghosting and blur for Gaussians that become unobserved or occluded at weakly supervised timestamps. We propose Splannequin, an architecture-agnostic regularization that detects two states of Gaussian primitives, hidden and defective, and applies temporal anchoring. Under predominantly forward camera motion, hidden states are anchored to their recent well-observed past states, while defective states are anchored to future states with stronger supervision. Our method integrates into existing dynamic Gaussian pipelines via simple loss terms, requires no architectural changes, and adds zero inference overhead. This results in markedly improved visual quality, enabling high-fidelity, user-selectable frozen-time renderings, validated by a 96% user preference. Project page: https://chien90190.github.io/splannequin/

Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

TL;DR

<3-5 sentence high-level summary> Splannequin tackles freezing monocular Mannequin-Challenge footage by addressing artifacts caused by ill-supervised Gaussian primitives in dynamic Gaussian splatting. It introduces a dual-state regularization that classifies Gaussians as hidden or defective and anchors them to reliable past or future frames, all within an architecture-agnostic, zero-inference-overhead framework. The approach yields substantial perceptual quality improvements and enables real-time, user-controlled freeze-time renderings, demonstrated on real and synthetic data with strong user preference results. This work enables high-fidelity, freeze-time visualizations for consumer videos, VR/AR experiences, and data augmentation for dynamic scene understanding while maintaining efficiency.

Abstract

Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstruction. Instead of focusing on modeling motion, our goal is to create a frozen scene while strategically preserving subtle dynamics to enable user-controlled instant selection. To achieve this, we introduce a novel application of dynamic Gaussian splatting: the scene is modeled dynamically, which retains nearby temporal variation, and a static scene is rendered by fixing the model's time parameter. However, under this usage, monocular capture with sparse temporal supervision introduces artifacts like ghosting and blur for Gaussians that become unobserved or occluded at weakly supervised timestamps. We propose Splannequin, an architecture-agnostic regularization that detects two states of Gaussian primitives, hidden and defective, and applies temporal anchoring. Under predominantly forward camera motion, hidden states are anchored to their recent well-observed past states, while defective states are anchored to future states with stronger supervision. Our method integrates into existing dynamic Gaussian pipelines via simple loss terms, requires no architectural changes, and adds zero inference overhead. This results in markedly improved visual quality, enabling high-fidelity, user-selectable frozen-time renderings, validated by a 96% user preference. Project page: https://chien90190.github.io/splannequin/

Paper Structure

This paper contains 19 sections, 5 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Existing Gaussian splatting frameworks cannot produce plausible results from casually captured Mannequin-Challenge videos.(Top) A short clip of the hand-held input sequence exhibits unintentional subject motion. (Right) State-of-the-art methods containing SC-GS huang2024sc, 4DGaussians wu20244d, and D-3DGS yang2024deformable. They all leave noticeable blur and double contours around the woman’s face (blue frames). (Left) Our Splannequin reconstruction (red frame) is crisp and temporally consistent, revealing fine hair strands and facial detail with no ghosting.
  • Figure 2: Time-Camera Conceptualization. Assuming forward camera motion, the diagonal dashed line represents standard dynamic rendering, while the horizontal line shows freeze-time rendering at a fixed timestamp $t^\star$. Along this freeze-time line, unsupervised Gaussians are either hidden (red points, as the camera has passed them) or defective (blue points, not yet well-observed). Our approach regularizes these problematic Gaussians by anchoring them to their supervised counterparts from other timestamps: hidden (red) Gaussians use past states, and defective (blue) Gaussians use future states. The right panel shows a bird's-eye view of a hallway, illustrating how the camera's path creates defective and hidden regions.
  • Figure 3: Illustration of hidden Gaussians. Given a timestamp, hidden Gaussians (Left) lie outside the camera frustum, receiving no supervision, while visible Gaussians (Right) are rasterized to form the image. Our method targets ill-supervised hidden Gaussians to prevent visual artifacts.
  • Figure 4: Splannequin Pipeline Overview. The pipeline: (1) extracts point clouds from input video, (2) use dynamic Gaussian splatting with dual-detection losses that anchor hidden Gaussians to earlier frames (t' $<$ t) and defective Gaussians to later frames (t $<$ t'), and (3) renders freeze-time videos at any timestamp $t^\star$. Temporal distance-based confidence weighting ensures appropriate regularization strength, with closer reference frames providing stronger anchoring than distant ones for robust temporal consistency and artifact elimination.
  • Figure 5: Qualitative Comparison across Our Real-World Benchmark. Each column shows freeze-time renderings from all methods at a viewpoint. Rows correspond to direct comparisons of identical viewpoints with baselines: 4DGaussians (top), D-3DGS (middle), and SC-GS (bottom). Adding Splannequin consistently produces sharper, more temporally coherent results, exhibiting reduced ghosting and artifact suppression compared to baseline methods.
  • ...and 4 more figures