RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration
Omar Alama, Avigyan Bhattacharya, Haoyang He, Seungchan Kim, Yuheng Qiu, Wenshan Wang, Cherie Ho, Nikhil Keetha, Sebastian Scherer
TL;DR
RayFronts tackles the challenge of open-set semantic mapping in open-world robotics by unifying within-range dense voxel semantics with beyond-range ray frontiers. It introduces a semantic map that couples a voxel-based within-range representation with ray-centered beyond-range reasoning, supported by a planner-agnostic evaluation metric to quantify search-space reduction. The method achieves state-of-the-art open-vocabulary 3D segmentation and strong online mapping performance, running at real-time speeds on embedded hardware ($8.84\ \mathrm{Hz}$ on an Orin AGX) and delivering substantial improvements in search efficiency ($2.2\times$ reduction) and fine-grained depth-aware semantics (up to $1.85\times$ offline mIoU). The approach is validated through planner-agnostic online benchmarks and offline 3D segmentation on diverse datasets, with qualitative real-world experiments showing robust performance in open-world environments. Together, these results suggest RayFronts as a practical, scalable solution for open-world robotic perception and exploration.
Abstract
Open-set semantic mapping is crucial for open-world robots. Current mapping approaches either are limited by the depth range or only map beyond-range entities in constrained settings, where overall they fail to combine within-range and beyond-range observations. Furthermore, these methods make a trade-off between fine-grained semantics and efficiency. We introduce RayFronts, a unified representation that enables both dense and beyond-range efficient semantic mapping. RayFronts encodes task-agnostic open-set semantics to both in-range voxels and beyond-range rays encoded at map boundaries, empowering the robot to reduce search volumes significantly and make informed decisions both within & beyond sensory range, while running at 8.84 Hz on an Orin AGX. Benchmarking the within-range semantics shows that RayFronts's fine-grained image encoding provides 1.34x zero-shot 3D semantic segmentation performance while improving throughput by 16.5x. Traditionally, online mapping performance is entangled with other system components, complicating evaluation. We propose a planner-agnostic evaluation framework that captures the utility for online beyond-range search and exploration, and show RayFronts reduces search volume 2.2x more efficiently than the closest online baselines.
