Table of Contents
Fetching ...

Two-step dynamic obstacle avoidance

Fabian Hart, Martin Waltz, Ostap Okhrin

TL;DR

Dynamic obstacle avoidance is challenging when obstacles exhibit non-linear, uncertain motions. The paper introduces a two-step approach: Step 1 uses an LSTM to forecast obstacle trajectories and compute CR metrics $d^{CPA}$ and $t^{CPA}$ in a supervised fashion, and Step 2 feeds these CR estimates into RL observations to guide decision-making. Experiments in a generic 2D DOA environment and a maritime generalization with AIS data show that CR-augmented RL yields about a 50% improvement in safety (rewards) and faster convergence, independent of the RL algorithm used. By coupling interpretable CPA-based risk metrics with data-driven trajectory prediction, the method enhances local path planning safety and generalization across transport domains.

Abstract

Dynamic obstacle avoidance (DOA) is a fundamental challenge for any autonomous vehicle, independent of whether it operates in sea, air, or land. This paper proposes a two-step architecture for handling DOA tasks by combining supervised and reinforcement learning (RL). In the first step, we introduce a data-driven approach to estimate the collision risk (CR) of an obstacle using a recurrent neural network, which is trained in a supervised fashion and offers robustness to non-linear obstacle movements. In the second step, we include these CR estimates into the observation space of an RL agent to increase its situational awareness. We illustrate the power of our two-step approach by training different RL agents in a challenging environment that requires to navigate amid multiple obstacles. The non-linear movements of obstacles are exemplarily modeled based on stochastic processes and periodic patterns, although our architecture is suitable for any obstacle dynamics. The experiments reveal that integrating our CR metrics into the observation space doubles the performance in terms of reward, which is equivalent to halving the number of collisions in the considered environment. We also perform a generalization experiment to validate the proposal in an RL environment based on maritime traffic and real-world vessel trajectory data. Furthermore, we show that the architecture's performance improvement is independent of the applied RL algorithm.

Two-step dynamic obstacle avoidance

TL;DR

Dynamic obstacle avoidance is challenging when obstacles exhibit non-linear, uncertain motions. The paper introduces a two-step approach: Step 1 uses an LSTM to forecast obstacle trajectories and compute CR metrics and in a supervised fashion, and Step 2 feeds these CR estimates into RL observations to guide decision-making. Experiments in a generic 2D DOA environment and a maritime generalization with AIS data show that CR-augmented RL yields about a 50% improvement in safety (rewards) and faster convergence, independent of the RL algorithm used. By coupling interpretable CPA-based risk metrics with data-driven trajectory prediction, the method enhances local path planning safety and generalization across transport domains.

Abstract

Dynamic obstacle avoidance (DOA) is a fundamental challenge for any autonomous vehicle, independent of whether it operates in sea, air, or land. This paper proposes a two-step architecture for handling DOA tasks by combining supervised and reinforcement learning (RL). In the first step, we introduce a data-driven approach to estimate the collision risk (CR) of an obstacle using a recurrent neural network, which is trained in a supervised fashion and offers robustness to non-linear obstacle movements. In the second step, we include these CR estimates into the observation space of an RL agent to increase its situational awareness. We illustrate the power of our two-step approach by training different RL agents in a challenging environment that requires to navigate amid multiple obstacles. The non-linear movements of obstacles are exemplarily modeled based on stochastic processes and periodic patterns, although our architecture is suitable for any obstacle dynamics. The experiments reveal that integrating our CR metrics into the observation space doubles the performance in terms of reward, which is equivalent to halving the number of collisions in the considered environment. We also perform a generalization experiment to validate the proposal in an RL environment based on maritime traffic and real-world vessel trajectory data. Furthermore, we show that the architecture's performance improvement is independent of the applied RL algorithm.
Paper Structure (29 sections, 26 equations, 19 figures, 2 tables)

This paper contains 29 sections, 26 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: The proposed two-level architecture for DOA tasks.
  • Figure 1: Replacement of an obstacle (obstacle $1$) since two obstacles with the same passing rule already passed the agent (negative time-to-collision). The obstacle's new $TTC_{t,1}$ is set uniformly at random in the time interval colored turquoise with the length of $\Delta TTC_{\rm max}$.
  • Figure 2: Training of the trajectory prediction module for periodic and stochastic obstacle trajectories.
  • Figure 2: Replacement of an obstacle identical to the situation in Figure \ref{['fig:EnvB_TTC']} but with additional information about lateral positions of obstacles.
  • Figure 3: Trajectory prediction for stochastic obstacle movement with last observations in green and ground truth data in blue.
  • ...and 14 more figures