Table of Contents
Fetching ...

Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering

Weiquan Wang, Feifei Shao, Lin Li, Zhen Wang, Jun Xiao, Long Chen

TL;DR

This work reframes monocular occluded human rendering as MAP estimation under heteroscedastic observation noise and introduces U-4DGS, a framework combining a Probabilistic Deformation Network with a Double Rasterization pipeline to produce pixel-aligned uncertainty maps that guide optimization. The uncertainty acts as an adaptive gradient modulator, selectively down-weighting unreliable observations and enabling confidence-aware regularizations to prevent geometric drift in occluded regions. Extensive experiments on ZJU-MoCap and OcMotion demonstrate state-of-the-art rendering fidelity and robustness, outperforming both discriminative and generative baselines under severe occlusion. The approach provides a principled, efficient alternative to hallucination-based methods, with practical implications for real-time, occlusion-resilient dynamic human rendering.

Abstract

High-fidelity rendering of dynamic humans from monocular videos typically degrades catastrophically under occlusions. Existing solutions incorporate external priors-either hallucinating missing content via generative models, which induces severe temporal flickering, or imposing rigid geometric heuristics that fail to capture diverse appearances. To this end, we reformulate the task as a Maximum A Posteriori estimation problem under heteroscedastic observation noise. In this paper, we propose U-4DGS, a framework integrating a Probabilistic Deformation Network and a Double Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Furthermore, to prevent geometric drift in regions lacking reliable visual cues, we enforce Confidence-Aware Regularizations, which leverage the learned uncertainty to selectively propagate spatial-temporal validity. Extensive experiments on ZJU-MoCap and OcMotion demonstrate that U-4DGS achieves SOTA rendering fidelity and robustness.

Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering

TL;DR

This work reframes monocular occluded human rendering as MAP estimation under heteroscedastic observation noise and introduces U-4DGS, a framework combining a Probabilistic Deformation Network with a Double Rasterization pipeline to produce pixel-aligned uncertainty maps that guide optimization. The uncertainty acts as an adaptive gradient modulator, selectively down-weighting unreliable observations and enabling confidence-aware regularizations to prevent geometric drift in occluded regions. Extensive experiments on ZJU-MoCap and OcMotion demonstrate state-of-the-art rendering fidelity and robustness, outperforming both discriminative and generative baselines under severe occlusion. The approach provides a principled, efficient alternative to hallucination-based methods, with practical implications for real-time, occlusion-resilient dynamic human rendering.

Abstract

High-fidelity rendering of dynamic humans from monocular videos typically degrades catastrophically under occlusions. Existing solutions incorporate external priors-either hallucinating missing content via generative models, which induces severe temporal flickering, or imposing rigid geometric heuristics that fail to capture diverse appearances. To this end, we reformulate the task as a Maximum A Posteriori estimation problem under heteroscedastic observation noise. In this paper, we propose U-4DGS, a framework integrating a Probabilistic Deformation Network and a Double Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Furthermore, to prevent geometric drift in regions lacking reliable visual cues, we enforce Confidence-Aware Regularizations, which leverage the learned uncertainty to selectively propagate spatial-temporal validity. Extensive experiments on ZJU-MoCap and OcMotion demonstrate that U-4DGS achieves SOTA rendering fidelity and robustness.
Paper Structure (16 sections, 10 equations, 6 figures, 2 tables)

This paper contains 16 sections, 10 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Performance, Fidelity, and Stability.(a) Our U-4DGS achieves the best trade-off between rendering quality and training efficiency. (b) Visualizes the occluded input and the corresponding uncertainty map predicted by our method. (c)-(f) Visual comparison. While our method (d) recovers fine details consistent with the Reference (c), Gauhuman (e) fails catastrophically, fusing occlusion artifacts into the body. SymGaussian (f) fails to recover asymmetric details (see red circle), as the model erroneously propagates features from the unadorned opposite side. (g) Temporal consistency. GTU suffers from severe texture drifting, where the shirt color inconsistently shifts over time. In contrast, our uncertainty-guided aggregation ensures physical consistency.
  • Figure 2: The framework of U-4DGS.Left: The Probabilistic Deformation Network conditions Canonical Gaussians on time embedding $\gamma(t)$ and pose $\theta_t$ to predict geometric offsets ($\Delta \mathbf{r}, \Delta \mathbf{\mu}, \Delta \mathbf{s}$) alongside per-primitive aleatoric uncertainty $\sigma$. Middle: The deformed Gaussians are transformed via LBS and rendered through a Double Rasterization pipeline, simultaneously producing a photometric image and a pixel-aligned Uncertainty Map (where bright regions indicate high uncertainty). Right: During optimization, the Uncertainty Map functions as an adaptive gradient modulator (symbolized by $\div$) in the $\mathcal{L}_{NLL}$ objective, effectively attenuating gradients from unreliable observations. Simultaneously, Confidence-Aware Regularizations ($\mathcal{L}_{spa}, \mathcal{L}_{temp}$) leverage the learned uncertainty to enforce spatial-temporal constraints in regions lacking reliable visual cues.
  • Figure 3: Qualitative comparisons on novel view synthesis.Left: Results on the ZJU-MoCap dataset with synthetic occlusions. Right: Results on the OcMotion dataset with real-world occlusions. GH denotes GauHuman hu2024gauhuman and OF denotes OccFusion sun2024occfusion. GauHuman fails to disentangle occluders from the human body. OccFusion tends to produce blurry textures or hallucination artifacts in heavily occluded regions. Our U-4DGS recovers high-fidelity geometry and appearance consistent with the reference view.
  • Figure 4: Qualitative ablation study.Exp. A (Baseline): The deterministic baseline overfits the occlusion, resulting in severe artifacts. Exp. B (+ Uncertainty): Explicit uncertainty modeling removes the artifacts but leaves the geometry noisy and ill-defined in the occluded region. Exp. C (+ $\mathcal{L}_{spa}$): Spatial regularization smooths out the noise, restoring a complete body shape. Exp. D (+ $\mathcal{L}_{temp}$): The full model further refines details and ensures physical plausibility.
  • Figure 5: Visualization of the learned Uncertainty Map.Left: The input image showing only the visible regions of the current frame (occluded regions are black). Right: The predicted uncertainty map $\hat{U}$ rendered by our method. As indicated by the color bar, dark blue represents high confidence (low uncertainty), while bright red indicates high uncertainty. The network correctly assigns high uncertainty to the "missing" occluded regions, effectively creating a soft mask to ignore unreliable supervision during training.
  • ...and 1 more figures