Table of Contents
Fetching ...

ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers

Liangliang Chen, Shiyu Jin, Haoyu Wang, Liangjun Zhang

TL;DR

ExACT tackles the challenge of end-to-end autonomous excavation by learning a controller that maps raw LiDAR, camera, and joint-position observations directly to valve commands using Action Chunking with Transformers (ACT). The system combines imitation learning with temporal ensembling and a conditional variational autoencoder to generate smooth action sequences, learned from a small set of human demonstrations and validated in a simulator built from real-world data. Results show successful execution of reach, dig_dump, and dig_dump_return tasks in the simulator, with limitations in high-frequency valve dynamics for digging that suggest the need for more demonstrations or higher control bandwidth. This work constitutes a first demonstration of end-to-end imitation-learning-based autonomous excavation, offering a path toward real-world deployment and enhanced safety in construction and mining settings.

Abstract

Excavators are crucial for diverse tasks such as construction and mining, while autonomous excavator systems enhance safety and efficiency, address labor shortages, and improve human working conditions. Different from the existing modularized approaches, this paper introduces ExACT, an end-to-end autonomous excavator system that processes raw LiDAR, camera data, and joint positions to control excavator valves directly. Utilizing the Action Chunking with Transformers (ACT) architecture, ExACT employs imitation learning to take observations from multi-modal sensors as inputs and generate actionable sequences. In our experiment, we build a simulator based on the captured real-world data to model the relations between excavator valve states and joint velocities. With a few human-operated demonstration data trajectories, ExACT demonstrates the capability of completing different excavation tasks, including reaching, digging and dumping through imitation learning in validations with the simulator. To the best of our knowledge, ExACT represents the first instance towards building an end-to-end autonomous excavator system via imitation learning methods with a minimal set of human demonstrations. The video about this work can be accessed at https://youtu.be/NmzR_Rf-aEk.

ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers

TL;DR

ExACT tackles the challenge of end-to-end autonomous excavation by learning a controller that maps raw LiDAR, camera, and joint-position observations directly to valve commands using Action Chunking with Transformers (ACT). The system combines imitation learning with temporal ensembling and a conditional variational autoencoder to generate smooth action sequences, learned from a small set of human demonstrations and validated in a simulator built from real-world data. Results show successful execution of reach, dig_dump, and dig_dump_return tasks in the simulator, with limitations in high-frequency valve dynamics for digging that suggest the need for more demonstrations or higher control bandwidth. This work constitutes a first demonstration of end-to-end imitation-learning-based autonomous excavation, offering a path toward real-world deployment and enhanced safety in construction and mining settings.

Abstract

Excavators are crucial for diverse tasks such as construction and mining, while autonomous excavator systems enhance safety and efficiency, address labor shortages, and improve human working conditions. Different from the existing modularized approaches, this paper introduces ExACT, an end-to-end autonomous excavator system that processes raw LiDAR, camera data, and joint positions to control excavator valves directly. Utilizing the Action Chunking with Transformers (ACT) architecture, ExACT employs imitation learning to take observations from multi-modal sensors as inputs and generate actionable sequences. In our experiment, we build a simulator based on the captured real-world data to model the relations between excavator valve states and joint velocities. With a few human-operated demonstration data trajectories, ExACT demonstrates the capability of completing different excavation tasks, including reaching, digging and dumping through imitation learning in validations with the simulator. To the best of our knowledge, ExACT represents the first instance towards building an end-to-end autonomous excavator system via imitation learning methods with a minimal set of human demonstrations. The video about this work can be accessed at https://youtu.be/NmzR_Rf-aEk.
Paper Structure (11 sections, 1 equation, 5 figures, 2 tables)

This paper contains 11 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Framework of ExACT
  • Figure 2: Examples of the front camera image and LiDAR elevation maps. (a) Raw LiDAR elevation map (visualized in RViz) from which the digging (red box) and dumping (green box) LiDAR elevation maps are cropped; (b) Preprocessed LiDAR elevation map of the digging zone; (c) Preprocessed LiDAR elevation map of the dumping zone; (d) front camera image.
  • Figure 3: Test performance of the task reach (valve state control)
  • Figure 4: Ground truth and predicted actions of the tasks (a) dig_dump and (b) dig_dump_return
  • Figure 5: Front camera images and bucket trajectory during the testing of the task dig_dump_return (joint position control)