3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints
Yoonkyu Yoo, Donghwi Jung, Seong-Woo Kim
TL;DR
This work addresses precise 3D control of hydraulic excavators by extending the action space to include cabin swing rotation, forming a 4-DOF model that enables 3D bucket motion. It introduces independent per-joint rewards for stable learning, and trains a TD3-based agent in simulation before validating on a real Hyundai excavator, achieving high accuracy and often outperforming a skilled operator. The approach demonstrates robust simulation-to-real transfer across linear and slope tasks and emphasizes practical applicability for construction sites. The key contributions are the 3D action-space expansion, independent reward design, and successful real-world validation, enabling continuous autonomous operation without human intervention.
Abstract
In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic structures. Traditional methods relied on operator expertise for precise excavator operation, occasionally resulting in safety accidents. Therefore, there have been endeavors to attain precise excavator control through equation-based control algorithms. However, these methods had the limitation of necessitating prior information related to physical values of the excavator, rendering them unsuitable for the diverse range of excavators used in the field. To overcome these limitations, we have explored reinforcement learning-based control methods that do not demand prior knowledge of specific equipment but instead utilize data to train models. Nevertheless, existing reinforcement learning-based methods overlooked cabin swing rotation and confined the bucket's workspace to a 2D plane. Control confined within such a limited area diminishes the applicability of the algorithm in construction sites. We address this issue by expanding the previous 2D plane workspace of the bucket operation into a 3D space, incorporating cabin swing rotation. By expanding the workspace into 3D, excavators can execute continuous operations without requiring human intervention. To accomplish this objective, distinct targets were established for each joint, facilitating the training of action values for each joint independently, regardless of the progress of other joint learning.
