Voxel-Aggregated Feature Synthesis: Efficient Dense Mapping for Simulated 3D Reasoning

Owen Burns; Rizwan Qureshi

Voxel-Aggregated Feature Synthesis: Efficient Dense Mapping for Simulated 3D Reasoning

Owen Burns, Rizwan Qureshi

TL;DR

VAFS addresses the heavy computation of open-set dense 3D mapping by leveraging simulator-provided, segmented point clouds and synthesizing per-object views, followed by voxel aggregation to maintain uniform density. The method reduces embeddings from per-frame counts to per-object counts and demonstrates faster runtime with improved semantic IoU on RoCoBench queries, outperforming ConceptFusion and LeRF. This approach makes dense 3D mapping practical for simulation-based embodied research and real-time updates. The key contributions are synthetic view generation per region, voxel pooling for density control, and a ground-truth semantic mapping pipeline.

Abstract

We address the issue of the exploding computational requirements of recent State-of-the-art (SOTA) open set multimodel 3D mapping (dense 3D mapping) algorithms and present Voxel-Aggregated Feature Synthesis (VAFS), a novel approach to dense 3D mapping in simulation. Dense 3D mapping involves segmenting and embedding sequential RGBD frames which are then fused into 3D. This leads to redundant computation as the differences between frames are small but all are individually segmented and embedded. This makes dense 3D mapping impractical for research involving embodied agents in which the environment, and thus the mapping, must be modified with regularity. VAFS drastically reduces this computation by using the segmented point cloud computed by a simulator's physics engine and synthesizing views of each region. This reduces the number of features to embed from the number of captured RGBD frames to the number of objects in the scene, effectively allowing a "ground truth" semantic map to be computed an order of magnitude faster than traditional methods. We test the resulting representation by assessing the IoU scores of semantic queries for different objects in the simulated scene, and find that VAFS exceeds the accuracy and speed of prior dense 3D mapping techniques.

Voxel-Aggregated Feature Synthesis: Efficient Dense Mapping for Simulated 3D Reasoning

TL;DR

Abstract

Voxel-Aggregated Feature Synthesis: Efficient Dense Mapping for Simulated 3D Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)