SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling

Yijun Yuan; Michael Bleier; Andreas Nüchter

SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling

Yijun Yuan, Michael Bleier, Andreas Nüchter

TL;DR

SceneFactory presents a unified, modular framework for incremental scene modeling that links tracking, depth estimation, and reconstruction in a dependency-driven workflow. It introduces four building blocks (tracking, flexion, depth estimation, reconstruction) and two novel components (DM-NPs for Surface Light Fields and IPR for fast surface querying) to support a wide range of inputs, including unposed/un calibrated multi-view data and RGB-LiDAR streams. The depth module ($U^2$-MVD) combines dense correspondences, robust checks, and DBA with a ScaleCov depth completion pipeline, enabling both RGB-D and unposed multi-view depth estimation, while reconstruction uses online learning of DM-NPs for high-quality color and surface representations. The paper also provides a new RGB-X dense monocular SLAM dataset and demonstrates competitive performance against state-of-the-art methods on diverse benchmarks, highlighting the framework’s flexibility, scalability, and potential for real-time, large-scale scene modeling. Overall, SceneFactory offers a practical, extensible pathway toward unified, production-line like scene modeling across varied sensing modalities and tasks.

Abstract

We present SceneFactory, a workflow-centric and unified framework for incremental scene modeling, that conveniently supports a wide range of applications, such as (unposed and/or uncalibrated) multi-view depth estimation, LiDAR completion, (dense) RGB-D/RGB-L/Mono/Depth-only reconstruction and SLAM. The workflow-centric design uses multiple blocks as the basis for constructing different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is placed on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks that form SceneFactory: (1) tracking, (2) flexion, (3) depth estimation, and (4) scene reconstruction. The tracking block is based on Mono SLAM and is extended to support RGB-D and RGB-LiDAR (RGB-L) inputs. Flexion is used to convert the depth image (untrackable) into a trackable image. For general-purpose depth estimation, we propose an unposed \& uncalibrated multi-view depth estimation model (U$^2$-MVD) to estimate dense geometry. U$^2$-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multi-view depth. Relying on U$^2$-MVD, SceneFactory both supports user-friendly 3D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose Dual-purpose Multi-resolutional Neural Points (DM-NPs) for the first surface accessible Surface Color Field design, where we introduce Improved Point Rasterization (IPR) for point cloud based surface query. ...

SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling

TL;DR

-MVD) combines dense correspondences, robust checks, and DBA with a ScaleCov depth completion pipeline, enabling both RGB-D and unposed multi-view depth estimation, while reconstruction uses online learning of DM-NPs for high-quality color and surface representations. The paper also provides a new RGB-X dense monocular SLAM dataset and demonstrates competitive performance against state-of-the-art methods on diverse benchmarks, highlighting the framework’s flexibility, scalability, and potential for real-time, large-scale scene modeling. Overall, SceneFactory offers a practical, extensible pathway toward unified, production-line like scene modeling across varied sensing modalities and tasks.

Abstract

-MVD) to estimate dense geometry. U

-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multi-view depth. Relying on U

-MVD, SceneFactory both supports user-friendly 3D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose Dual-purpose Multi-resolutional Neural Points (DM-NPs) for the first surface accessible Surface Color Field design, where we introduce Improved Point Rasterization (IPR) for point cloud based surface query. ...

Paper Structure (30 sections, 10 equations, 8 figures, 1 algorithm)

This paper contains 30 sections, 10 equations, 8 figures, 1 algorithm.

Introduction
Related Works
Dense SLAM
Neural Rendering in SLAM
Multi-view Depth Estimation
A Unified Framework for Incremental Scene Modeling
Tracking Block
Flexion Estimation Block
Depth Estimation Block
Reconstruction Block
Online-learning thread
Visualization thread
Main Function
Dual-purposes Multiresolutional Neural Points
Multiresolutional Neural Points
...and 15 more sections

Figures (8)

Figure 1: SceneFactory is workflow-centric and supports a wide range of applications given different input combinations of RGB $\mathbf I_\text{rgb}$, depth $\mathbf I_\text{d}$, pose $\mathbf G$ and intrinsics $\mathbf \theta$.
Figure 2: The dependency graph in SceneFactory. SceneFactory sends requests to its dependent sub-tasks (inputs/blocks). If the dependent sub-tasks are not complete, then each sub-task will call its corresponding dependent sub-tasks. The gray line indicates the requirement of a specific application. The yellow line shows the dependency of input (RGB $\mathbf I_{rgb}$, depth $\mathbf I_{d}$, pose $\mathbf G$ and intrinsics $\mathbf\theta$), which is triggered when the input value is None. The green, purple, blue, and pink lines show the dependencies of the Flexion, Depth, Tracking, and Scene Reconstruction blocks. The solid and dotted lines show mandatory and optional dependencies. Inside each block, the functions are applied one after each other, as shown by the black arrows.
Figure 3: Depth images (left) and their corresponding trackable converted flexion images (right).
Figure 4: Illustration of the multiresolution neural points in 2D. The top row indicates the (a) allocating and (b) training during online learning. The bottom row shows the (c) rasterization and (d) color prediction for visualization. We use multiple levels of neural points, for example, green dots for low level and pink dots for higher level. The corresponding circle indicate the resolution of that levels of points.
Figure 5: Near-far-imbalance. The ray penetrates the near surface.
...and 3 more figures

SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling

TL;DR

Abstract

SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (8)