Table of Contents
Fetching ...

RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration

Omar Alama, Avigyan Bhattacharya, Haoyang He, Seungchan Kim, Yuheng Qiu, Wenshan Wang, Cherie Ho, Nikhil Keetha, Sebastian Scherer

TL;DR

RayFronts tackles the challenge of open-set semantic mapping in open-world robotics by unifying within-range dense voxel semantics with beyond-range ray frontiers. It introduces a semantic map that couples a voxel-based within-range representation with ray-centered beyond-range reasoning, supported by a planner-agnostic evaluation metric to quantify search-space reduction. The method achieves state-of-the-art open-vocabulary 3D segmentation and strong online mapping performance, running at real-time speeds on embedded hardware ($8.84\ \mathrm{Hz}$ on an Orin AGX) and delivering substantial improvements in search efficiency ($2.2\times$ reduction) and fine-grained depth-aware semantics (up to $1.85\times$ offline mIoU). The approach is validated through planner-agnostic online benchmarks and offline 3D segmentation on diverse datasets, with qualitative real-world experiments showing robust performance in open-world environments. Together, these results suggest RayFronts as a practical, scalable solution for open-world robotic perception and exploration.

Abstract

Open-set semantic mapping is crucial for open-world robots. Current mapping approaches either are limited by the depth range or only map beyond-range entities in constrained settings, where overall they fail to combine within-range and beyond-range observations. Furthermore, these methods make a trade-off between fine-grained semantics and efficiency. We introduce RayFronts, a unified representation that enables both dense and beyond-range efficient semantic mapping. RayFronts encodes task-agnostic open-set semantics to both in-range voxels and beyond-range rays encoded at map boundaries, empowering the robot to reduce search volumes significantly and make informed decisions both within & beyond sensory range, while running at 8.84 Hz on an Orin AGX. Benchmarking the within-range semantics shows that RayFronts's fine-grained image encoding provides 1.34x zero-shot 3D semantic segmentation performance while improving throughput by 16.5x. Traditionally, online mapping performance is entangled with other system components, complicating evaluation. We propose a planner-agnostic evaluation framework that captures the utility for online beyond-range search and exploration, and show RayFronts reduces search volume 2.2x more efficiently than the closest online baselines.

RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration

TL;DR

RayFronts tackles the challenge of open-set semantic mapping in open-world robotics by unifying within-range dense voxel semantics with beyond-range ray frontiers. It introduces a semantic map that couples a voxel-based within-range representation with ray-centered beyond-range reasoning, supported by a planner-agnostic evaluation metric to quantify search-space reduction. The method achieves state-of-the-art open-vocabulary 3D segmentation and strong online mapping performance, running at real-time speeds on embedded hardware ( on an Orin AGX) and delivering substantial improvements in search efficiency ( reduction) and fine-grained depth-aware semantics (up to offline mIoU). The approach is validated through planner-agnostic online benchmarks and offline 3D segmentation on diverse datasets, with qualitative real-world experiments showing robust performance in open-world environments. Together, these results suggest RayFronts as a practical, scalable solution for open-world robotic perception and exploration.

Abstract

Open-set semantic mapping is crucial for open-world robots. Current mapping approaches either are limited by the depth range or only map beyond-range entities in constrained settings, where overall they fail to combine within-range and beyond-range observations. Furthermore, these methods make a trade-off between fine-grained semantics and efficiency. We introduce RayFronts, a unified representation that enables both dense and beyond-range efficient semantic mapping. RayFronts encodes task-agnostic open-set semantics to both in-range voxels and beyond-range rays encoded at map boundaries, empowering the robot to reduce search volumes significantly and make informed decisions both within & beyond sensory range, while running at 8.84 Hz on an Orin AGX. Benchmarking the within-range semantics shows that RayFronts's fine-grained image encoding provides 1.34x zero-shot 3D semantic segmentation performance while improving throughput by 16.5x. Traditionally, online mapping performance is entangled with other system components, complicating evaluation. We propose a planner-agnostic evaluation framework that captures the utility for online beyond-range search and exploration, and show RayFronts reduces search volume 2.2x more efficiently than the closest online baselines.

Paper Structure

This paper contains 24 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 2: Overview of our online mapping system, RayFronts is designed for multi-objective & multi-modal open-set querying of both in-range and beyond-range semantic entities. Given posed RGB-D images, we first extract dense features with our fast language-aligned image encoder. Then, posed depth information and features are used to construct a semantic voxel map for in-range queries. In parallel, RayFronts also maintains a VDB-based occupancy map to generate frontiers, which are further associated with multi-directional semantic rays. These semantic ray fronts enable us to perform beyond-range querying of open-set concepts in the unobserved region.
  • Figure 3: An illustration of our proposed planner-agnostic metric (Search Cut Volume Recall) for open-world online search benchmarking. Intuitively, the metric captures "How much of the search volume is eliminated correctly?" An optimal mapper should promptly and accurately reduce the search space, enabling fast multi-object localization and exploration.
  • Figure 4: RayFronts consistently surpasses baselines for online semantic mapping. Two query scenarios are shown: (1) querying for a prominent object (i.e Building) that enters depth range, and (2) a distant object (i.e Chimney) that remains beyond range. Through unified dense voxel mapping, and beyond-range semantic ray frontiers, RayFronts sets the upper-bound in both scenarios.
  • Figure 5: RayFronts provides state-of-the-art mIoU & 17.5 Hz throughput on an AGX Orin. It surpasses Trident with 1.34x higher mIoU and a 16.5x speedup, while achieving 1.81x higher mIoU than NACLIP, which operates at a similar throughput.
  • Figure A.1: Top left shows how RayFronts is able to avoid feature collisions through the use of multiple rays that capture distinct semantics observed through the same frontier, where semantic frontier approaches chen2023trainyokoyama2024vlfm fail. The top right illustrates that even with no depth information, RayFronts dense language-aligned encoding can allow it to capture non-prominent semantics where semantic pose approaches thomas2024embedding fail. The bottom row highlights that RayFronts is the upper bound in accurately reducing search volume.
  • ...and 3 more figures