Table of Contents
Fetching ...

Vision-Aided Online A* Path Planning for Efficient and Safe Navigation of Service Robots

Praveen Kumar, Tushar Sandhan

TL;DR

The paper tackles the need for semantic-aware navigation in service robots without relying on expensive LiDAR. It introduces a hybrid framework that tightly couples a lightweight semantic segmentation model (ESANet) with an online $A^*$ planner operating on a dynamic occupancy grid, fused from RGB-D data to form a unified map that encodes both geometric obstacles and user-defined visual constraints. Key contributions include real-time semantic awareness on embedded hardware, an open-source implementation and dataset, and a system architecture that supports robust, context-aware navigation in unknown environments. Experimental validation in high-fidelity simulation and real-world hardware demonstrates that a cost-effective robot can safely navigate complex spaces while respecting non-geometric constraints defined by operators. The work has practical impact by enabling flexible, semantically guided navigation for service robots using affordable sensors and computation.

Abstract

The deployment of autonomous service robots in human-centric environments is hindered by a critical gap in perception and planning. Traditional navigation systems rely on expensive LiDARs that, while geometrically precise, are semantically unaware, they cannot distinguish a important document on an office floor from a harmless piece of litter, treating both as physically traversable. While advanced semantic segmentation exists, no prior work has successfully integrated this visual intelligence into a real-time path planner that is efficient enough for low-cost, embedded hardware. This paper presents a framework to bridge this gap, delivering context-aware navigation on an affordable robotic platform. Our approach centers on a novel, tight integration of a lightweight perception module with an online A* planner. The perception system employs a semantic segmentation model to identify user-defined visual constraints, enabling the robot to navigate based on contextual importance rather than physical size alone. This adaptability allows an operator to define what is critical for a given task, be it sensitive papers in an office or safety lines in a factory, thus resolving the ambiguity of what to avoid. This semantic perception is seamlessly fused with geometric data. The identified visual constraints are projected as non-geometric obstacles onto a global map that is continuously updated from sensor data, enabling robust navigation through both partially known and unknown environments. We validate our framework through extensive experiments in high-fidelity simulations and on a real-world robotic platform. The results demonstrate robust, real-time performance, proving that a cost-effective robot can safely navigate complex environments while respecting critical visual cues invisible to traditional planners.

Vision-Aided Online A* Path Planning for Efficient and Safe Navigation of Service Robots

TL;DR

The paper tackles the need for semantic-aware navigation in service robots without relying on expensive LiDAR. It introduces a hybrid framework that tightly couples a lightweight semantic segmentation model (ESANet) with an online planner operating on a dynamic occupancy grid, fused from RGB-D data to form a unified map that encodes both geometric obstacles and user-defined visual constraints. Key contributions include real-time semantic awareness on embedded hardware, an open-source implementation and dataset, and a system architecture that supports robust, context-aware navigation in unknown environments. Experimental validation in high-fidelity simulation and real-world hardware demonstrates that a cost-effective robot can safely navigate complex spaces while respecting non-geometric constraints defined by operators. The work has practical impact by enabling flexible, semantically guided navigation for service robots using affordable sensors and computation.

Abstract

The deployment of autonomous service robots in human-centric environments is hindered by a critical gap in perception and planning. Traditional navigation systems rely on expensive LiDARs that, while geometrically precise, are semantically unaware, they cannot distinguish a important document on an office floor from a harmless piece of litter, treating both as physically traversable. While advanced semantic segmentation exists, no prior work has successfully integrated this visual intelligence into a real-time path planner that is efficient enough for low-cost, embedded hardware. This paper presents a framework to bridge this gap, delivering context-aware navigation on an affordable robotic platform. Our approach centers on a novel, tight integration of a lightweight perception module with an online A* planner. The perception system employs a semantic segmentation model to identify user-defined visual constraints, enabling the robot to navigate based on contextual importance rather than physical size alone. This adaptability allows an operator to define what is critical for a given task, be it sensitive papers in an office or safety lines in a factory, thus resolving the ambiguity of what to avoid. This semantic perception is seamlessly fused with geometric data. The identified visual constraints are projected as non-geometric obstacles onto a global map that is continuously updated from sensor data, enabling robust navigation through both partially known and unknown environments. We validate our framework through extensive experiments in high-fidelity simulations and on a real-world robotic platform. The results demonstrate robust, real-time performance, proving that a cost-effective robot can safely navigate complex environments while respecting critical visual cues invisible to traditional planners.

Paper Structure

This paper contains 16 sections, 3 equations, 13 figures, 1 table, 2 algorithms.

Figures (13)

  • Figure 1: Illustrates the real-world setups and scenarios: Husky A200 wheeled robot equipped with an Intel RealSense D455 depth camera, navigating an indoor environment with avoidable items (e.g., foam), unavoidable items (e.g., electrical cables, joystick), and a color-marked prohibited area. The robot encounters several challenges: interpreting color-coded markings and symbols to identify prohibited areas presented in label (a); distinguishing muddy water at floor level to prevent misclassification as navigable terrain (b); avoiding reserved spaces marked with specific colors (c); recognizing user-defined color-coded ground surfaces indicative of organized zones (d); identifying industry-standard color markings denoting workstations for safety and critical operations (e); detecting potential landmines embedded within the navigation path (f); and recognizing chemical spills present on the surface (g). Each scenario is integrated into a comprehensive knowledge graph, enabling our algorithm to plan safe and user-centric navigation paths.
  • Figure 2: System Architecture Overview: Data from the D455 RGB-D camera is processed in two streams. The RGB and depth streams feed the ESANet semantic segmentation model, whose output is combined with depth data to generate a segmented region point cloud. The depth stream provides a filtered point cloud representing geometric obstacles. Both point clouds are fused into a dynamic occupancy grid, updating the global map. An online A* algorithm uses this map, along with the robot's current state (Odometry) and user-defined goal, to compute a global path. This path is converted into waypoints for a local planner, which generates final velocity commands $(v, \omega)$ for the robot.
  • Figure 3: Perception Module Training Pipeline: The process begins with the administrator defining a workspace-specific "Beware list" and setting up the environment. The D455 camera captures RGB and Depth data. RGB images are manually annotated based on the "Beware list" to create a labeled training dataset. This dataset (RGB images, Depth maps, and Semantic Labels) is used to train the ESANet model via supervised learning, optimizing a loss function to produce an accurate semantic segmentation model tailored for the target application.
  • Figure 4: Navigation workspace populated with user-defined items and color-marked zones placed on the ground surface. Left and right frames illustrate the scene setup, highlighting areas that the robot must detect and avoid during path planning.
  • Figure 5: RGB-D Semantic Segmentation Results. Frames 1 to 6 display a workstation navigation scene containing both considerable and nonconsiderable objects. Frames $1^*$ to $6^*$ show the corresponding semantic segmentation outputs generated by our trained ESANet model. These masks highlight the identified "Beware list" items (e.g., objects to avoid, designated zones), demonstrating the model's ability to accurately perceive and classify task-relevant semantic information from raw RGB input.
  • ...and 8 more figures