Autonomous Wheel Loader Navigation Using Goal-Conditioned Actor-Critic MPC
Aleksi Mäki-Penttilä, Naeim Ebrahimi Toulkani, Reza Ghabcheloo
TL;DR
This work addresses autonomous wheel loader navigation to arbitrary goals by embedding a goal-conditioned Lyapunov-based Actor-Critic RL critic into a nonlinear MPC, enabling long-horizon planning within real-time limits. The RL critic informs both the MPC stage and terminal costs, while the MPC enforces actuator, state, and obstacle constraints, yielding time-efficient trajectories with safety guarantees. Key contributions include a Lyapunov-based RL training framework (ALAC) with a gradient penalty to stabilize learning, and a Taylor-expanded stage cost around the previous solution that preserves real-time solvability. Real-world experiments on an Avant 635 wheel loader demonstrate faster convergence than a baseline trajectory optimization, with simulations suggesting substantial speedups across diverse scenarios. The approach shows strong potential for practical autonomous operation in constrained, dynamic settings, albeit with challenges in obstacle-rich real-time performance and occasional solver difficulties that point to future enhancements such as control barrier functions.
Abstract
This paper proposes a novel control method for an autonomous wheel loader, enabling time-efficient navigation to an arbitrary goal pose. Unlike prior works which combine high-level trajectory planners with Model Predictive Control (MPC), we directly enhance the planning capabilities of MPC by incorporating a cost function derived from Actor-Critic Reinforcement Learning (RL). Specifically, we first train an RL agent to solve the pose reaching task in simulation, then transfer the learned planning knowledge to an MPC by incorporating the trained neural network critic as both the stage and terminal cost. We show through comprehensive simulations that the resulting MPC inherits the time-efficient behavior of the RL agent, generating trajectories that compare favorably against those found using trajectory optimization. We also deploy our method on a real-world wheel loader, where we demonstrate successful navigation in various scenarios.
