Table of Contents
Fetching ...

Local Path Planning among Pushable Objects based on Reinforcement Learning

Linghong Yao, Valerio Modugno, Andromachi Maria Delfaki, Yuanchang Liu, Danail Stoyanov, Dimitrios Kanoulas

TL;DR

The paper tackles local path planning among pushable obstacles (NAMO) by learning a non-axis-aligned pushing policy via Advantage Actor-Critic in parallel simulated agents using NVIDIA Isaac Gym, with domain randomization to bridge sim-to-real gaps. The state combines a semantic occupancy grid and a feature vector; the network outputs both value and continuous actions $(v_x, \\dot{\theta})$ through a shared backbone, trained with a clipped surrogate objective and entropy regularization. A curriculum learning strategy across eight map layouts and robust sim-to-real transfer are demonstrated, showing high success in varied simulations (up to 91% in single-map, ~80% across maps, 54% in unseen maps) and successful real-world tests on a Unitree Go1. The work highlights non-linear obstacle manipulation and the potential to integrate with global planners like A* to enhance navigation in cluttered, uncertain environments.

Abstract

In this paper, we introduce a method to deal with the problem of robot local path planning among pushable objects -- an open problem in robotics. In particular, we achieve that by training multiple agents simultaneously in a physics-based simulation environment, utilizing an Advantage Actor-Critic algorithm coupled with a deep neural network. The developed online policy enables these agents to push obstacles in ways that are not limited to axial alignments, adapt to unforeseen changes in obstacle dynamics instantaneously, and effectively tackle local path planning in confined areas. We tested the method in various simulated environments to prove the adaptation effectiveness to various unseen scenarios in unfamiliar settings. Moreover, we have successfully applied this policy on an actual quadruped robot, confirming its capability to handle the unpredictability and noise associated with real-world sensors and the inherent uncertainties present in unexplored object pushing tasks.

Local Path Planning among Pushable Objects based on Reinforcement Learning

TL;DR

The paper tackles local path planning among pushable obstacles (NAMO) by learning a non-axis-aligned pushing policy via Advantage Actor-Critic in parallel simulated agents using NVIDIA Isaac Gym, with domain randomization to bridge sim-to-real gaps. The state combines a semantic occupancy grid and a feature vector; the network outputs both value and continuous actions through a shared backbone, trained with a clipped surrogate objective and entropy regularization. A curriculum learning strategy across eight map layouts and robust sim-to-real transfer are demonstrated, showing high success in varied simulations (up to 91% in single-map, ~80% across maps, 54% in unseen maps) and successful real-world tests on a Unitree Go1. The work highlights non-linear obstacle manipulation and the potential to integrate with global planners like A* to enhance navigation in cluttered, uncertain environments.

Abstract

In this paper, we introduce a method to deal with the problem of robot local path planning among pushable objects -- an open problem in robotics. In particular, we achieve that by training multiple agents simultaneously in a physics-based simulation environment, utilizing an Advantage Actor-Critic algorithm coupled with a deep neural network. The developed online policy enables these agents to push obstacles in ways that are not limited to axial alignments, adapt to unforeseen changes in obstacle dynamics instantaneously, and effectively tackle local path planning in confined areas. We tested the method in various simulated environments to prove the adaptation effectiveness to various unseen scenarios in unfamiliar settings. Moreover, we have successfully applied this policy on an actual quadruped robot, confirming its capability to handle the unpredictability and noise associated with real-world sensors and the inherent uncertainties present in unexplored object pushing tasks.
Paper Structure (12 sections, 3 equations, 7 figures, 3 tables)

This paper contains 12 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Visual sensors capture the surroundings, and this data is then processed into a specific state representation. This processed information is inputted into a previously trained policy network. The network then generates a strategic action directive for the robot, enabling it to navigate and address the task of local path planning.
  • Figure 2: Our approach utilizes a deep neural network for policy-making. The state representation, $s_t$, includes both a vector and a grid. The vector, of length $242$, contains data on the agent's current position, the corners of the object, the previous action taken, and the destination. The grid is a $48\times48$ matrix, with each cell semantically annotated. Initially, the vector is processed through a linear unit, while the grid undergoes processing through two convolutional units followed by a linear unit. The results from both processes are then merged and further processed through two additional linear units. Subsequently, the network splits, employing two distinct sets of weights to generate both the estimated value and the proposed action.
  • Figure 3: Fixed maps and random obstacle positions. Agents (gray) need to push object (yellow) to reach the goal (green). The maps include corridors (a,b), mid (c,d), side (e,f), and diagonal (g,h) doorways.
  • Figure 4: The progression of rewards through policy update iterations is depicted with the single map scenario in purple and the multi-map scenario in green. The phenomenon of curriculum learning is evident through the steep declines in the purple line, indicating the increasing difficulty of tasks; this effect is mitigated in the multi-map scenario, where the curve is more gradual since not all tasks advance through the curriculum at the same time. While the single map approach tends toward a steady state of balance, the multi-map approach demonstrates a need for early termination due to its varied progression.
  • Figure 5: Qualitative simulated results on single map setting: the agent adapts to different obstacle positions and finds efficient paths.
  • ...and 2 more figures