Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Filippo A. Spinelli; Pascal Egli; Julian Nubert; Fang Nan; Thilo Bleumer; Patrick Goegler; Stephan Brockes; Ferdinand Hofmann; Marco Hutter

Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Filippo A. Spinelli, Pascal Egli, Julian Nubert, Fang Nan, Thilo Bleumer, Patrick Goegler, Stephan Brockes, Ferdinand Hofmann, Marco Hutter

TL;DR

This work tackles autonomous control of a large hydraulic material handler with a free-swinging end-effector by combining data-driven modeling of the slew actuator with first-principles dynamics for the arm and tool. An RL policy, trained entirely in simulation with domain randomization, learns to command the slew and arm joints to reach 3D Cartesian targets while actively damping end-effector oscillations. Key contributions include a data-collection routine and NN for slew delay modeling, a hybrid simulation environment, and experimental validation on a 40 t prototype showing competitive performance relative to human operators and improved oscillation suppression. The results demonstrate a viable path toward autonomous operation of large material-handling machines, with implications for efficiency and safety in harsh environments.

Abstract

The precise and safe control of heavy material handling machines presents numerous challenges due to the hard-to-model hydraulically actuated joints and the need for collision-free trajectory planning with a free-swinging end-effector tool. In this work, we propose an RL-based controller that commands the cabin joint and the arm simultaneously. It is trained in a simulation combining data-driven modeling techniques with first-principles modeling. On the one hand, we employ a neural network model to capture the highly nonlinear dynamics of the upper carriage turn hydraulic motor, incorporating explicit pressure prediction to handle delays better. On the other hand, we model the arm as velocity-controllable and the free-swinging end-effector tool as a damped pendulum using first principles. This combined model enhances our simulation environment, enabling the training of RL controllers that can be directly transferred to the real machine. Designed to reach steady-state Cartesian targets, the RL controller learns to leverage the hydraulic dynamics to improve accuracy, maintain high speeds, and minimize end-effector tool oscillations. Our controller, tested on a mid-size prototype material handler, is more accurate than an inexperienced operator and causes fewer tool oscillations. It demonstrates competitive performance even compared to an experienced professional driver.

Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

TL;DR

Abstract

Paper Structure (28 sections, 10 equations, 8 figures, 6 tables)

This paper contains 28 sections, 10 equations, 8 figures, 6 tables.

Introduction
Related Work
Control of Hydraulic Machines
Control in the Presence of Passive Joints
Contributions
System Description
Feedback
Arm Velocity Controller
Proposed Approach
Slew Actuator Model
Data Collection
Data Augmentation
Neural Network Model
Simulation Environment
RL End-Effector Controller
...and 13 more sections

Figures (8)

Figure 1: The prototype material handler used in this work has an operational range of about 20 m and weighs more than 40 t. A 1.5 t grabbing shovel designed for loose material was employed, with a maximum load of 2 t.
Figure 2: Open-loop prediction using the NN model for a 40-second trapezoidal reference. This shape approximates a control profile while maintaining regularity to mitigate noise effects.
Figure 3: Arm controller dynamics modeled as first-order systems with delay (left), and tool dynamics modeled via Lagrange and dissipation (right).
Figure 4: The tool is modeled as a double pendulum with linearly oscillating support. In the left figure, we show the forces accounted for. The adopted approximations for each DoF are shown on the right.
Figure 5: Schematic of the ROS 2 interface. Nodes are oval, and the communication interfaces are represented in rectangular boxes with message rates. The RL Controller outputs three actions $[u_{slew}, \Hat{\Dot{q}}_{boom}, \Hat{\Dot{q}}_{stick} ]$ at 10Hz, interpreted by the Joint Controller to provide arm joystick inputs at 50Hz, using FF and PI compensation, while maintaining a constant zero-order hold slew joystick signal for 5 iterations.
...and 3 more figures

Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

TL;DR

Abstract

Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Authors

TL;DR

Abstract

Table of Contents

Figures (8)