Table of Contents
Fetching ...

FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes

Marcel Büsching, Josef Bengtson, David Nilsson, Mårten Björkman

TL;DR

FlowIBR tackles monocular dynamic novel-view synthesis by decoupling scene dynamics from rendering via a per-scene learned scene flow, which bends camera rays to align with moving content. It combines a pre-trained generalizable static rendering backbone (GNT) with a lightweight scene-flow field implemented on a permutohedral lattice, enabling dynamic scenes to be rendered with a static IBR pipeline. The method achieves substantial reductions in per-scene optimization time (about $1.5$ hours on a single GPU) while delivering competitive rendering quality on the Nvidia Dynamic Scenes Dataset, as validated against state-of-the-art baselines and supported by an extensive ablation study. This approach lowers hardware barriers for dynamic-view synthesis and opens avenues for faster, scalable monocular rendering in dynamic environments.

Abstract

We introduce FlowIBR, a novel approach for efficient monocular novel view synthesis of dynamic scenes. Existing techniques already show impressive rendering quality but tend to focus on optimization within a single scene without leveraging prior knowledge, resulting in long optimization times per scene. FlowIBR circumvents this limitation by integrating a neural image-based rendering method, pre-trained on a large corpus of widely available static scenes, with a per-scene optimized scene flow field. Utilizing this flow field, we bend the camera rays to counteract the scene dynamics, thereby presenting the dynamic scene as if it were static to the rendering network. The proposed method reduces per-scene optimization time by an order of magnitude, achieving comparable rendering quality to existing methods -- all on a single consumer-grade GPU.

FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes

TL;DR

FlowIBR tackles monocular dynamic novel-view synthesis by decoupling scene dynamics from rendering via a per-scene learned scene flow, which bends camera rays to align with moving content. It combines a pre-trained generalizable static rendering backbone (GNT) with a lightweight scene-flow field implemented on a permutohedral lattice, enabling dynamic scenes to be rendered with a static IBR pipeline. The method achieves substantial reductions in per-scene optimization time (about hours on a single GPU) while delivering competitive rendering quality on the Nvidia Dynamic Scenes Dataset, as validated against state-of-the-art baselines and supported by an extensive ablation study. This approach lowers hardware barriers for dynamic-view synthesis and opens avenues for faster, scalable monocular rendering in dynamic environments.

Abstract

We introduce FlowIBR, a novel approach for efficient monocular novel view synthesis of dynamic scenes. Existing techniques already show impressive rendering quality but tend to focus on optimization within a single scene without leveraging prior knowledge, resulting in long optimization times per scene. FlowIBR circumvents this limitation by integrating a neural image-based rendering method, pre-trained on a large corpus of widely available static scenes, with a per-scene optimized scene flow field. Utilizing this flow field, we bend the camera rays to counteract the scene dynamics, thereby presenting the dynamic scene as if it were static to the rendering network. The proposed method reduces per-scene optimization time by an order of magnitude, achieving comparable rendering quality to existing methods -- all on a single consumer-grade GPU.
Paper Structure (37 sections, 22 equations, 8 figures, 3 tables)

This paper contains 37 sections, 22 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Method overview a) An image at an arbitrary position (orange camera) is synthesised based on existing observations (black camera), collected at different times. Problem: Due to the movement of the skater, the skater is not on the epipolar line of the camera ray. b) We model scene motion using per-scene learned scene flow. c) Scene flow is used to compensate the motion by bending the camera ray. d) A pre-trained neural IBR method for static scenes t2023AttentionAll is used for image synthesis.
  • Figure 2: Scene flow compensation The scene flow ($\mathcal{S}_{f}$, $\mathcal{S}_{b}$) is used to adjust the ray $\bm{r}_{\Tilde{t},m}$ from the target camera $\Tilde{\bm{P}}_{\Tilde{t}}$ through the current pixel $m$, so that it follows the motion of the balloon at the two adjacent times $\Tilde{t}+1$ and $\Tilde{t}-1$. This allows the projection of the ray on the source observations to contain the pixels corresponding to $m$, marked by arrows.
  • Figure 3: Qualitative evaluation on Nvidia Dynamic Scenes (default) yoon2020NovelView Renderings for FlowIBR and the retrained methods (HyperNeRF, DVS, NSFF). We can clearly see that FlowIBR is able to synthesize novel views from previously unobserved viewpoints, with quality close to the ground truth (GT) image and to state-of-the-art methods.
  • Figure 4: Scene flow visualization Projection of $\bm{s}_f$ (top) and $\bm{s}_b$ (bottom) onto the image plane. The arrows in the reference view show the true scene motion. Scene flow is visualized with hue indicating the direction and intensity the magnitude. Here our method correctly learned the rotating motion shown in the image.
  • Figure 5: Failure cases (a) Fast moving objects can exhibit motion blur. (b) For small objects, FlowIBR occasionally fails to learn any scene flow at all, instead learning a continuous near-zero function. (c) In some cases, images contain rendering artifacts, especially for unbounded backgrounds.
  • ...and 3 more figures