Table of Contents
Fetching ...

ManeuverNet: A Soft Actor-Critic Framework for Precise Maneuvering of Double-Ackermann-Steering Robots with Optimized Reward Functions

Kohio Deflesselle, Mélodie Daniel, Aly Magassouba, Miguel Aranda, Olivier Ly

TL;DR

ManeuverNet is presented, a DRL framework tailored for double-Ackermann systems, combining Soft Actor-Critic with CrossQ, which substantially improves maneuverability and success rates and effectively mitigates the strong parameter sensitivity observed in the TEB planner.

Abstract

Autonomous control of double-Ackermann-steering robots is essential in agricultural applications, where robots must execute precise and complex maneuvers within a limited space. Classical methods, such as the Timed Elastic Band (TEB) planner, can address this problem, but they rely on parameter tuning, making them highly sensitive to changes in robot configuration or environment and impractical to deploy without constant recalibration. At the same time, end-to-end deep reinforcement learning (DRL) methods often fail due to unsuitable reward functions for non-holonomic constraints, resulting in sub-optimal policies and poor generalization. To address these challenges, this paper presents ManeuverNet, a DRL framework tailored for double-Ackermann systems, combining Soft Actor-Critic with CrossQ. Furthermore, ManeuverNet introduces four specifically designed reward functions to support maneuver learning. Unlike prior work, ManeuverNet does not depend on expert data or handcrafted guidance. We extensively evaluate ManeuverNet against both state-of-the-art DRL baselines and the TEB planner. Experimental results demonstrate that our framework substantially improves maneuverability and success rates, achieving more than a 40% gain over DRL baselines. Moreover, ManeuverNet effectively mitigates the strong parameter sensitivity observed in the TEB planner. In real-world trials, ManeuverNet achieved up to a 90% increase in maneuvering trajectory efficiency, highlighting its robustness and practical applicability.

ManeuverNet: A Soft Actor-Critic Framework for Precise Maneuvering of Double-Ackermann-Steering Robots with Optimized Reward Functions

TL;DR

ManeuverNet is presented, a DRL framework tailored for double-Ackermann systems, combining Soft Actor-Critic with CrossQ, which substantially improves maneuverability and success rates and effectively mitigates the strong parameter sensitivity observed in the TEB planner.

Abstract

Autonomous control of double-Ackermann-steering robots is essential in agricultural applications, where robots must execute precise and complex maneuvers within a limited space. Classical methods, such as the Timed Elastic Band (TEB) planner, can address this problem, but they rely on parameter tuning, making them highly sensitive to changes in robot configuration or environment and impractical to deploy without constant recalibration. At the same time, end-to-end deep reinforcement learning (DRL) methods often fail due to unsuitable reward functions for non-holonomic constraints, resulting in sub-optimal policies and poor generalization. To address these challenges, this paper presents ManeuverNet, a DRL framework tailored for double-Ackermann systems, combining Soft Actor-Critic with CrossQ. Furthermore, ManeuverNet introduces four specifically designed reward functions to support maneuver learning. Unlike prior work, ManeuverNet does not depend on expert data or handcrafted guidance. We extensively evaluate ManeuverNet against both state-of-the-art DRL baselines and the TEB planner. Experimental results demonstrate that our framework substantially improves maneuverability and success rates, achieving more than a 40% gain over DRL baselines. Moreover, ManeuverNet effectively mitigates the strong parameter sensitivity observed in the TEB planner. In real-world trials, ManeuverNet achieved up to a 90% increase in maneuvering trajectory efficiency, highlighting its robustness and practical applicability.
Paper Structure (17 sections, 4 equations, 5 figures, 5 tables)

This paper contains 17 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Maneuver handling (left figure) with 4WS robots in DRL is challenging because it requires current reward loss (circled in red in the right figure), making classical approaches sub-optimal.
  • Figure 2: DASMR rotating around an instantaneous center of rotation (ICR).
  • Figure 3: Comparison between the shape of our reward functions (top) vs. classic state-of-the-art reward functions (bottom). X is the longitudinal axis, Y is the lateral axis, and the robot's center position is at the origin. Each point on the heatmaps represents the reward for a $\boldsymbol{X_d}$ in the same position. Thus, the robot reaches $\boldsymbol{X_d}$ when $\|\boldsymbol{X_d} \| < d_\text{th}$.
  • Figure 4: Environment setup in simulation and real-world settings. The red square denotes the goal space. The blue square represents the robot's workspace. In real-world settings, both spaces coincide to impose stricter constraints on navigation and positioning.
  • Figure 5: An example of a maneuver executed by ManeuverNet in both simulation and real-world settings is shown in the first two rows. Another example, showcasing multi-terrain performance, is presented in the final row of figures. The red dot represents $\boldsymbol{X}_d$.