Table of Contents
Fetching ...

Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning

Shay Snyder, Ryan Shea, Andrew Capodieci, David Gorsich, Maryam Parsa

TL;DR

The paper addresses the challenge of generalizing reinforcement learning policies for exploration and path planning by comparing a biologically inspired hyperdimensional occupancy grid mapping (VSA-OGM) against traditional OGM methods (BHM). By wrapping LiDAR data into an OGM format and using a PPO-based RL pipeline, the study demonstrates that VSA-OGM achieves comparable learning performance while markedly improving generalization to unseen environments, with gains up to approximately 53% on MarsExplorer and 47% on RaceCarGym. However, these generalization benefits come with increased computational and memory demands, including higher latency and larger memory footprints for VSA-OGM, especially under high-density LiDAR conditions. The results support VSA-OGM as a promising neuromorphic-compatible alternative for robust deployment in diverse environments, and point to future work on reducing encoding complexity and extending to model-based RL. The work thus contributes to scalable, generalizable perception-to-control pipelines for real-world autonomous systems.

Abstract

Real-time autonomous systems utilize multi-layer computational frameworks to perform critical tasks such as perception, goal finding, and path planning. Traditional methods implement perception using occupancy grid mapping (OGM), segmenting the environment into discretized cells with probabilistic information. This classical approach is well-established and provides a structured input for downstream processes like goal finding and path planning algorithms. Recent approaches leverage a biologically inspired mathematical framework known as vector symbolic architectures (VSA), commonly known as hyperdimensional computing, to perform probabilistic OGM in hyperdimensional space. This approach, VSA-OGM, provides native compatibility with spiking neural networks, positioning VSA-OGM as a potential neuromorphic alternative to conventional OGM. However, for large-scale integration, it is essential to assess the performance implications of VSA-OGM on downstream tasks compared to established OGM methods. This study examines the efficacy of VSA-OGM against a traditional OGM approach, Bayesian Hilbert Maps (BHM), within reinforcement learning based goal finding and path planning frameworks, across a controlled exploration environment and an autonomous driving scenario inspired by the F1-Tenth challenge. Our results demonstrate that VSA-OGM maintains comparable learning performance across single and multi-scenario training configurations while improving performance on unseen environments by approximately 47%. These findings highlight the increased generalizability of policy networks trained with VSA-OGM over BHM, reinforcing its potential for real-world deployment in diverse environments.

Generalizable Reinforcement Learning with Biologically Inspired Hyperdimensional Occupancy Grid Maps for Exploration and Goal-Directed Path Planning

TL;DR

The paper addresses the challenge of generalizing reinforcement learning policies for exploration and path planning by comparing a biologically inspired hyperdimensional occupancy grid mapping (VSA-OGM) against traditional OGM methods (BHM). By wrapping LiDAR data into an OGM format and using a PPO-based RL pipeline, the study demonstrates that VSA-OGM achieves comparable learning performance while markedly improving generalization to unseen environments, with gains up to approximately 53% on MarsExplorer and 47% on RaceCarGym. However, these generalization benefits come with increased computational and memory demands, including higher latency and larger memory footprints for VSA-OGM, especially under high-density LiDAR conditions. The results support VSA-OGM as a promising neuromorphic-compatible alternative for robust deployment in diverse environments, and point to future work on reducing encoding complexity and extending to model-based RL. The work thus contributes to scalable, generalizable perception-to-control pipelines for real-world autonomous systems.

Abstract

Real-time autonomous systems utilize multi-layer computational frameworks to perform critical tasks such as perception, goal finding, and path planning. Traditional methods implement perception using occupancy grid mapping (OGM), segmenting the environment into discretized cells with probabilistic information. This classical approach is well-established and provides a structured input for downstream processes like goal finding and path planning algorithms. Recent approaches leverage a biologically inspired mathematical framework known as vector symbolic architectures (VSA), commonly known as hyperdimensional computing, to perform probabilistic OGM in hyperdimensional space. This approach, VSA-OGM, provides native compatibility with spiking neural networks, positioning VSA-OGM as a potential neuromorphic alternative to conventional OGM. However, for large-scale integration, it is essential to assess the performance implications of VSA-OGM on downstream tasks compared to established OGM methods. This study examines the efficacy of VSA-OGM against a traditional OGM approach, Bayesian Hilbert Maps (BHM), within reinforcement learning based goal finding and path planning frameworks, across a controlled exploration environment and an autonomous driving scenario inspired by the F1-Tenth challenge. Our results demonstrate that VSA-OGM maintains comparable learning performance across single and multi-scenario training configurations while improving performance on unseen environments by approximately 47%. These findings highlight the increased generalizability of policy networks trained with VSA-OGM over BHM, reinforcing its potential for real-world deployment in diverse environments.

Paper Structure

This paper contains 5 sections, 6 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The process of transforming the baseline observation spaces with polar-coordinate LIDAR scans to Cartesian coordinates located and orientated on the global reference frame. (A) The first and last LIDAR scans in polar coordinates. (B) The scans transformed into Cartesian coordinates. (C) The point clouds oriented based on the agent's location and orientation.
  • Figure 2: The qualitative results of training a convolutional policy network with Proximal Policy Optimization schulman2017proximalpolicyoptimizationalgorithms for multiple levels in MarsExplorerKoutras2021MarsExplorer with VSA-OGM snyder2024brain and BHM senanayake2017bayesian.
  • Figure 3: The generalization capabilities of policy networks trained with VSA-OGM snyder2024brain and BHM senanayake2017bayesian on unseen map layouts within the MarsExplorer environment Koutras2021MarsExplorer.
  • Figure 4: The qualitative results of training a multi-headed policy network with Proximal Policy Optimization schulman2017proximalpolicyoptimizationalgorithms for multiple tracks in RaceCarGymBrunnbauer_racecar_gym with VSA-OGM snyder2024brain and BHM senanayake2017bayesian.
  • Figure 5: The generalizability of trained multi-headed policy networks when trained and evaluated on multiple maps. Austria, Berlin, and Treitlstrasse were used for training with all remaining maps being used for evaluation. All results are averaged over 5 evaluations.
  • ...and 1 more figures