Table of Contents
Fetching ...

Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

Mirco Theile, Lukas Dirnberger, Raphael Trumpp, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

TL;DR

The paper tackles sample efficiency and constraint handling in deep reinforcement learning for continuous action spaces. It introduces action mapping (AM), which learns a feasibility policy $\pi_f$ that maps latent $z$ to the state-dependent feasible-action set and composes it with an objective policy $\pi_o$ so that $\pi = \pi_f \circ \pi_o$, effectively turning a state-wise constrained MDP into an unconstrained MDP. AM instantiates with SAC and PPO (AM-SAC, AM-PPO) and is evaluated on a robotic arm pose task with perfect feasibility and a spline-based path planning task with approximate feasibility, outperforming action replacement, resampling, projection, and Lagrangian baselines in learning speed and constraint adherence. The work demonstrates that imperfect feasibility models can provide valuable inductive bias and enable multi-modal action distributions via latent representations, with practical implications for constrained robotics and autonomous planning.

Abstract

Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.

Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

TL;DR

The paper tackles sample efficiency and constraint handling in deep reinforcement learning for continuous action spaces. It introduces action mapping (AM), which learns a feasibility policy that maps latent to the state-dependent feasible-action set and composes it with an objective policy so that , effectively turning a state-wise constrained MDP into an unconstrained MDP. AM instantiates with SAC and PPO (AM-SAC, AM-PPO) and is evaluated on a robotic arm pose task with perfect feasibility and a spline-based path planning task with approximate feasibility, outperforming action replacement, resampling, projection, and Lagrangian baselines in learning speed and constraint adherence. The work demonstrates that imperfect feasibility models can provide valuable inductive bias and enable multi-modal action distributions via latent representations, with practical implications for constrained robotics and autonomous planning.

Abstract

Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.

Paper Structure

This paper contains 30 sections, 20 equations, 9 figures, 7 tables, 2 algorithms.

Figures (9)

  • Figure 1: Interaction architecture of action mapping, showing how a perfect feasibility policy $\pi_f$ can transform the SCMDP indicated by $\mathrm{P}$ into an unconstrained MDP with transition function $\mathrm{P}_f$ and action space $\mathcal{Z}$.
  • Figure 2: Robotic arm end-effector pose environment with obstacles in gray and the target pose to the left.
  • Figure 3: Spline-based path planning environment with constant velocity and non-holonomic constraints.
  • Figure 4: Training curves for the two applications with the results for the robot arm environment in (a)+(b) and for the path planning environment in (c)+(d). For each configuration, 3 agents were trained, with the curves showing the median and the region between highest and lowest performance. Both "Replacement" agents show no constraint violation in (b).
  • Figure 5: Visualization of 256 generated actions of $\pi_f^\theta$ for a given state (a) if $z$ is sampled uniformly, (b) if sampled from the distribution of $\pi_o^{\phi_b}$, being an objective policy in the beginning of training, and (c) if sampled from the distribution of $\pi_o^{\phi_e}$ which is the agent at the end of training.
  • ...and 4 more figures