Table of Contents
Fetching ...

Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning

Yuki Kadokawa, Hirotaka Tahara, Takamitsu Matsubara

TL;DR

The paper tackles the high computational cost of learning fine-resolution policies for rock excavation by introducing Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies from coarse to fine simulations through middle-resolution stages with conservative policy transfer. PRPD is implemented within a variable-resolution Isaac Gym simulator and leverages PPO-based policy learning, a conservative KL-based transfer, and an auxiliary Q-function to estimate transfer dynamics. Empirical results show PRPD achieves roughly a 7-fold reduction in total learning time while preserving task success rates comparable to fine-resolution training, and enables robust sim-to-real transfer across nine real-world rock environments. The work also discusses the practical considerations of resolution scheduling, loss-term balancing, and broader applicability to other complex simulation-to-reality tasks. Overall, PRPD offers a principled, scalable pathway to time-efficient RL for realistic, particle-based excavation tasks with tangible real-world impact.

Abstract

In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, requiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre-training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learning in a fine-resolution simulation. Additional videos and supplementary results are available on our project page: https://yuki-kadokawa.github.io/prpd/

Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning

TL;DR

The paper tackles the high computational cost of learning fine-resolution policies for rock excavation by introducing Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies from coarse to fine simulations through middle-resolution stages with conservative policy transfer. PRPD is implemented within a variable-resolution Isaac Gym simulator and leverages PPO-based policy learning, a conservative KL-based transfer, and an auxiliary Q-function to estimate transfer dynamics. Empirical results show PRPD achieves roughly a 7-fold reduction in total learning time while preserving task success rates comparable to fine-resolution training, and enables robust sim-to-real transfer across nine real-world rock environments. The work also discusses the practical considerations of resolution scheduling, loss-term balancing, and broader applicability to other complex simulation-to-reality tasks. Overall, PRPD offers a principled, scalable pathway to time-efficient RL for realistic, particle-based excavation tasks with tangible real-world impact.

Abstract

In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, requiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre-training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learning in a fine-resolution simulation. Additional videos and supplementary results are available on our project page: https://yuki-kadokawa.github.io/prpd/

Paper Structure

This paper contains 41 sections, 10 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of proposed framework: Fine-resolution simulations yield high policy performance but require long learning times, while coarse-resolution simulations allow for quick learning but perform poorly in sim-to-real transfer. Our framework starts with coarse-resolution simulations for quick learning and progressively transfers policies to fine-resolution simulations. Progressive resolution shift with conservative policy transfer is applied to avoid large domain gaps that could lead to policy transfer failure. This approach balances learning time with real-world performance.
  • Figure 2: Our experimental rock excavation setup: The excavator operates a bucket attached to its arm to remove rocks from the soil. Inputs to the control policy include the bucket's posture $p^b$ (position and rotation) from the excavator's absolute encoder, rock coordinates $p^r$ estimated by the camera, and the presence of rocks in the bucket $f^r$ estimated by the force sensor. The output of the control policy is the position $a^{\text{XYZ}}$ and rotation of the bucket $a^{\text{Pitch}}$. The fork-shaped bucket is designed to imitate the features of skeleton buckets.
  • Figure 3: Applying PRPD to our rock excavation simulator: The resolution scheduler progressively changes the simulation resolution. The simulation generator creates the environment (soil, rocks, bucket) at this resolution. At each resolution, agents collect samples and update policies.
  • Figure 4: Snapshots of simulation environments with each resolution
  • Figure 5: Comparison of learning time (top) and policy mixture rate (bottom): (Top) This compares learning time between fixed resolution learning and PRPD. $\Delta=70, \cdots, \Delta=10$ refers to resolutions as \ref{['table:calculation_time_rate_list']}, while "Mix" refers to mixed resolutions (simultaneous policy learning), both up to $400\times128\times128$ samples (circle points are plotted every $100\times128\times128$ samples). The success rate is evaluated 100 times in $\Delta=10$ at each iteration. (Bottom) This shows the task success rate and mixture rate transitions of PRPD in each resolution. The dashed line indicates when the scheduler changes resolution by achieving the target success rate $\hat{\tau}$. $\Delta=70$ is the initial environment and lacks a previous policy, so the mixture rate is omitted. Each curve plots the mean (and variance) over five experiments.
  • ...and 2 more figures