Autonomous Wheel Loader Navigation Using Goal-Conditioned Actor-Critic MPC

Aleksi Mäki-Penttilä; Naeim Ebrahimi Toulkani; Reza Ghabcheloo

Autonomous Wheel Loader Navigation Using Goal-Conditioned Actor-Critic MPC

Aleksi Mäki-Penttilä, Naeim Ebrahimi Toulkani, Reza Ghabcheloo

TL;DR

This work addresses autonomous wheel loader navigation to arbitrary goals by embedding a goal-conditioned Lyapunov-based Actor-Critic RL critic into a nonlinear MPC, enabling long-horizon planning within real-time limits. The RL critic informs both the MPC stage and terminal costs, while the MPC enforces actuator, state, and obstacle constraints, yielding time-efficient trajectories with safety guarantees. Key contributions include a Lyapunov-based RL training framework (ALAC) with a gradient penalty to stabilize learning, and a Taylor-expanded stage cost around the previous solution that preserves real-time solvability. Real-world experiments on an Avant 635 wheel loader demonstrate faster convergence than a baseline trajectory optimization, with simulations suggesting substantial speedups across diverse scenarios. The approach shows strong potential for practical autonomous operation in constrained, dynamic settings, albeit with challenges in obstacle-rich real-time performance and occasional solver difficulties that point to future enhancements such as control barrier functions.

Abstract

This paper proposes a novel control method for an autonomous wheel loader, enabling time-efficient navigation to an arbitrary goal pose. Unlike prior works which combine high-level trajectory planners with Model Predictive Control (MPC), we directly enhance the planning capabilities of MPC by incorporating a cost function derived from Actor-Critic Reinforcement Learning (RL). Specifically, we first train an RL agent to solve the pose reaching task in simulation, then transfer the learned planning knowledge to an MPC by incorporating the trained neural network critic as both the stage and terminal cost. We show through comprehensive simulations that the resulting MPC inherits the time-efficient behavior of the RL agent, generating trajectories that compare favorably against those found using trajectory optimization. We also deploy our method on a real-world wheel loader, where we demonstrate successful navigation in various scenarios.

Autonomous Wheel Loader Navigation Using Goal-Conditioned Actor-Critic MPC

TL;DR

Abstract

Paper Structure (17 sections, 1 theorem, 28 equations, 7 figures, 2 tables)

This paper contains 17 sections, 1 theorem, 28 equations, 7 figures, 2 tables.

Introduction
Prior work
Actor-Critic Model Predictive Control
Lyapunov neural networks
Preliminaries
Methodology
Wheel loader kinematic model
Reinforcement Learning environment design
Actor and critic neural networks
Lyapunov-based Reinforcement Learning
Model Predictive Control problem formulation
Experiments And Results
Experimental setup
Baseline trajectory optimization
Highlighted scenarios
...and 2 more sections

Key Result

Theorem III.1

(Sampling-based Lyapunov stability wang2023policyoptimizationmethodoptimaltime). The mean cost stability of a system can be shown through a function $L(s, g): \mathcal{S} \times \mathcal{S} \rightarrow \mathbb{R}$ when it satisfies the conditions: where $k_l, k_u > 0$ and $k, \lambda \in (0, 1]$ are constants. When the conditions in eq:mc_props are met, then $L(s, g)$ is a valid sampling-based Ly

Figures (7)

Figure 1: Overview of the proposed control system.
Figure 2: Depiction of the wheel loader kinematic model.
Figure 3: The Avant 635 wheel loader used in our experiments.
Figure 4: Commanded and estimated velocities for scenario (b).
Figure 5: The three highlighted scenarios. (a) Short loading cycle (b) Compact 180-degree turn (c) Navigation through multiple obstacles. Scenarios (a) and (b) were evaluated in the real world, while scenario (c) was conducted using simulations. The time $t$ signifies the first time instant when $||x - g|| < 0.1$. For scenario (c) we illustrate trajectories for both $N=10$ and $N=20$ to highlight the dependence on a sufficiently long prediction horizon.
...and 2 more figures

Theorems & Definitions (2)

Definition III.1
Theorem III.1

Autonomous Wheel Loader Navigation Using Goal-Conditioned Actor-Critic MPC

TL;DR

Abstract

Autonomous Wheel Loader Navigation Using Goal-Conditioned Actor-Critic MPC

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)