Uni-Fusion: Universal Continuous Mapping
Yijun Yuan, Andreas Nuechter
TL;DR
Uni-Fusion introduces a universal, training-free framework for continuous mapping that encodes geometry and arbitrary surface properties into Latent Implicit Maps (LIM) using per-voxel latent features and Nyström-based kernel approximation. By decoupling regression into a position encoder and a content encoder, the method builds compact voxel latents (l≈20) that fuse incrementally into a global LIM, enabling real-time surface reconstruction, color/infrared fields, and high-dimensional feature fields such as CLIP embeddings. The approach supports derivative-based and sample-based GPIS for surface inference, and extends to surface property and feature fields with a flexible fusion scheme, demonstrated across incremental reconstruction, 2D-to-3D property transfer, and open-vocabulary scene understanding. Experiments on ScanNet, TUM RGB-D, Replica, and segmentation datasets show competitive or superior performance with significantly reduced memory footprint and real-time capabilities, while maintaining flexibility to incorporate new properties without training. The work lays a foundation for universal 3D mapping and CLIP-based scene understanding, with potential extensions to loop-closure, bundle adjustment, and visual-language navigation.
Abstract
We present Uni-Fusion, a universal continuous mapping framework for surfaces, surface properties (color, infrared, etc.) and more (latent features in CLIP embedding space, etc.). We propose the first universal implicit encoding model that supports encoding of both geometry and different types of properties (RGB, infrared, features, etc.) without requiring any training. Based on this, our framework divides the point cloud into regular grid voxels and generates a latent feature in each voxel to form a Latent Implicit Map (LIM) for geometries and arbitrary properties. Then, by fusing a local LIM frame-wisely into a global LIM, an incremental reconstruction is achieved. Encoded with corresponding types of data, our Latent Implicit Map is capable of generating continuous surfaces, surface property fields, surface feature fields, and all other possible options. To demonstrate the capabilities of our model, we implement three applications: (1) incremental reconstruction for surfaces and color (2) 2D-to-3D transfer of fabricated properties (3) open-vocabulary scene understanding by creating a text CLIP feature field on surfaces. We evaluate Uni-Fusion by comparing it in corresponding applications, from which Uni-Fusion shows high-flexibility in various applications while performing best or being competitive. The project page of Uni-Fusion is available at https://jarrome.github.io/Uni-Fusion/ .
