Table of Contents
Fetching ...

Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

Filippo A. Spinelli, Pascal Egli, Julian Nubert, Fang Nan, Thilo Bleumer, Patrick Goegler, Stephan Brockes, Ferdinand Hofmann, Marco Hutter

TL;DR

This work tackles autonomous control of a large hydraulic material handler with a free-swinging end-effector by combining data-driven modeling of the slew actuator with first-principles dynamics for the arm and tool. An RL policy, trained entirely in simulation with domain randomization, learns to command the slew and arm joints to reach 3D Cartesian targets while actively damping end-effector oscillations. Key contributions include a data-collection routine and NN for slew delay modeling, a hybrid simulation environment, and experimental validation on a 40 t prototype showing competitive performance relative to human operators and improved oscillation suppression. The results demonstrate a viable path toward autonomous operation of large material-handling machines, with implications for efficiency and safety in harsh environments.

Abstract

The precise and safe control of heavy material handling machines presents numerous challenges due to the hard-to-model hydraulically actuated joints and the need for collision-free trajectory planning with a free-swinging end-effector tool. In this work, we propose an RL-based controller that commands the cabin joint and the arm simultaneously. It is trained in a simulation combining data-driven modeling techniques with first-principles modeling. On the one hand, we employ a neural network model to capture the highly nonlinear dynamics of the upper carriage turn hydraulic motor, incorporating explicit pressure prediction to handle delays better. On the other hand, we model the arm as velocity-controllable and the free-swinging end-effector tool as a damped pendulum using first principles. This combined model enhances our simulation environment, enabling the training of RL controllers that can be directly transferred to the real machine. Designed to reach steady-state Cartesian targets, the RL controller learns to leverage the hydraulic dynamics to improve accuracy, maintain high speeds, and minimize end-effector tool oscillations. Our controller, tested on a mid-size prototype material handler, is more accurate than an inexperienced operator and causes fewer tool oscillations. It demonstrates competitive performance even compared to an experienced professional driver.

Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools

TL;DR

This work tackles autonomous control of a large hydraulic material handler with a free-swinging end-effector by combining data-driven modeling of the slew actuator with first-principles dynamics for the arm and tool. An RL policy, trained entirely in simulation with domain randomization, learns to command the slew and arm joints to reach 3D Cartesian targets while actively damping end-effector oscillations. Key contributions include a data-collection routine and NN for slew delay modeling, a hybrid simulation environment, and experimental validation on a 40 t prototype showing competitive performance relative to human operators and improved oscillation suppression. The results demonstrate a viable path toward autonomous operation of large material-handling machines, with implications for efficiency and safety in harsh environments.

Abstract

The precise and safe control of heavy material handling machines presents numerous challenges due to the hard-to-model hydraulically actuated joints and the need for collision-free trajectory planning with a free-swinging end-effector tool. In this work, we propose an RL-based controller that commands the cabin joint and the arm simultaneously. It is trained in a simulation combining data-driven modeling techniques with first-principles modeling. On the one hand, we employ a neural network model to capture the highly nonlinear dynamics of the upper carriage turn hydraulic motor, incorporating explicit pressure prediction to handle delays better. On the other hand, we model the arm as velocity-controllable and the free-swinging end-effector tool as a damped pendulum using first principles. This combined model enhances our simulation environment, enabling the training of RL controllers that can be directly transferred to the real machine. Designed to reach steady-state Cartesian targets, the RL controller learns to leverage the hydraulic dynamics to improve accuracy, maintain high speeds, and minimize end-effector tool oscillations. Our controller, tested on a mid-size prototype material handler, is more accurate than an inexperienced operator and causes fewer tool oscillations. It demonstrates competitive performance even compared to an experienced professional driver.
Paper Structure (28 sections, 10 equations, 8 figures, 6 tables)

This paper contains 28 sections, 10 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The prototype material handler used in this work has an operational range of about 20 m and weighs more than 40 t. A 1.5 t grabbing shovel designed for loose material was employed, with a maximum load of 2 t.
  • Figure 2: Open-loop prediction using the NN model for a 40-second trapezoidal reference. This shape approximates a control profile while maintaining regularity to mitigate noise effects.
  • Figure 3: Arm controller dynamics modeled as first-order systems with delay (left), and tool dynamics modeled via Lagrange and dissipation (right).
  • Figure 4: The tool is modeled as a double pendulum with linearly oscillating support. In the left figure, we show the forces accounted for. The adopted approximations for each DoF are shown on the right.
  • Figure 5: Schematic of the ROS 2 interface. Nodes are oval, and the communication interfaces are represented in rectangular boxes with message rates. The RL Controller outputs three actions $[u_{slew}, \Hat{\Dot{q}}_{boom}, \Hat{\Dot{q}}_{stick} ]$ at 10Hz, interpreted by the Joint Controller to provide arm joystick inputs at 50Hz, using FF and PI compensation, while maintaining a constant zero-order hold slew joystick signal for 5 iterations.
  • ...and 3 more figures