GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians
Dasong Gao, Peter Zhi Xuan Li, Vivienne Sze, Sertac Karaman
TL;DR
GEVO tackles memory and energy limitations in monocular GS-based SLAM by rendering past views from a compact Gaussian map instead of storing numerous images, thereby mitigating memory overhead on mobile devices. It introduces occupancy-preserving initialization to prune occlusions and consistency-aware optimization to prevent overfitting, enabling high-fidelity maps despite not retraining on stored past imagery. Across Replica and TUM-RGBD datasets, GEVO achieves comparable rendering and localization accuracy while reducing overhead to about 58 MB, up to 94x lower than prior methods, making GS-based SLAM feasible on energy-constrained platforms. The approach paves the way for real-time, dense, photo-realistic mapping on devices with tight memory budgets, balancing fidelity and practicality for AR/VR and mobile robotics.
Abstract
Constructing a high-fidelity representation of the 3D scene using a monocular camera can enable a wide range of applications on mobile devices, such as micro-robots, smartphones, and AR/VR headsets. On these devices, memory is often limited in capacity and its access often dominates the consumption of compute energy. Although Gaussian Splatting (GS) allows for high-fidelity reconstruction of 3D scenes, current GS-based SLAM is not memory efficient as a large number of past images is stored to retrain Gaussians for reducing catastrophic forgetting. These images often require two-orders-of-magnitude higher memory than the map itself and thus dominate the total memory usage. In this work, we present GEVO, a GS-based monocular SLAM framework that achieves comparable fidelity as prior methods by rendering (instead of storing) them from the existing map. Novel Gaussian initialization and optimization techniques are proposed to remove artifacts from the map and delay the degradation of the rendered images over time. Across a variety of environments, GEVO achieves comparable map fidelity while reducing the memory overhead to around 58 MBs, which is up to 94x lower than prior works.
