Table of Contents
Fetching ...

Multi-log grasping using reinforcement learning and virtual visual servoing

Erik Wallin, Viktor Wiberg, Martin Servin

TL;DR

The paper tackles autonomous forwarding in forestry, focusing on multi-log grasping under unstructured outdoor conditions. It introduces a virtual camera sourced from 3D pile reconstructions and uses Cartesian crane control to ease sim-to-real transfer, training a model-free RL agent (PPO) to grasp one or more logs from piles of 2–5 logs using 64×64 camera streams plus crane-state observations. Key contributions include the virtual visual servoing framework, a dense camera-based reward design, curriculum-driven RL training achieving up to 95% success, and insight into which observations drive learned behavior via ablation analyses. This work demonstrates a practical, modular path toward autonomous/log-forwarding systems that can integrate with log segmentation pipelines and real-world crane control.

Abstract

We explore multi-log grasping using reinforcement learning and virtual visual servoing for automated forwarding in a simulated environment. Automation of forest processes is a major challenge, and many techniques regarding robot control pose different challenges due to the unstructured and harsh outdoor environment. Grasping multiple logs involves various problems of dynamics and path planning, where understanding the interaction between the grapple, logs, terrain, and obstacles requires visual information. To address these challenges, we separate image segmentation from crane control and utilise a virtual camera to provide an image stream from reconstructed 3D data. We use Cartesian control to simplify domain transfer to real-world applications. Since log piles are static, visual servoing using a 3D reconstruction of the pile and its surroundings is equivalent to using real camera data until the point of grasping. This relaxes the limits on computational resources and time for the challenge of image segmentation and allows for collecting data in situations where the log piles are not occluded. The disadvantage is the lack of information during grasping. We demonstrate that this problem is manageable and present an agent that is 95% successful in picking one or several logs from challenging piles of 2--5 logs.

Multi-log grasping using reinforcement learning and virtual visual servoing

TL;DR

The paper tackles autonomous forwarding in forestry, focusing on multi-log grasping under unstructured outdoor conditions. It introduces a virtual camera sourced from 3D pile reconstructions and uses Cartesian crane control to ease sim-to-real transfer, training a model-free RL agent (PPO) to grasp one or more logs from piles of 2–5 logs using 64×64 camera streams plus crane-state observations. Key contributions include the virtual visual servoing framework, a dense camera-based reward design, curriculum-driven RL training achieving up to 95% success, and insight into which observations drive learned behavior via ablation analyses. This work demonstrates a practical, modular path toward autonomous/log-forwarding systems that can integrate with log segmentation pipelines and real-world crane control.

Abstract

We explore multi-log grasping using reinforcement learning and virtual visual servoing for automated forwarding in a simulated environment. Automation of forest processes is a major challenge, and many techniques regarding robot control pose different challenges due to the unstructured and harsh outdoor environment. Grasping multiple logs involves various problems of dynamics and path planning, where understanding the interaction between the grapple, logs, terrain, and obstacles requires visual information. To address these challenges, we separate image segmentation from crane control and utilise a virtual camera to provide an image stream from reconstructed 3D data. We use Cartesian control to simplify domain transfer to real-world applications. Since log piles are static, visual servoing using a 3D reconstruction of the pile and its surroundings is equivalent to using real camera data until the point of grasping. This relaxes the limits on computational resources and time for the challenge of image segmentation and allows for collecting data in situations where the log piles are not occluded. The disadvantage is the lack of information during grasping. We demonstrate that this problem is manageable and present an agent that is 95% successful in picking one or several logs from challenging piles of 2--5 logs.
Paper Structure (15 sections, 3 equations, 10 figures, 2 tables)

This paper contains 15 sections, 3 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Illustration of the virtual camera setup, showing (a) the actual pile, (b) the corresponding 3D reconstruction, and (c,d) the depth- and greyscale virtual streams. The position of the virtual camera is represented by a dot, with the orientation and extent illustrated by the dashed square.
  • Figure 2: Example of piles, with corresponding depth and RGB images for eight piles with 2--5 logs. The elevation difference of the used terrains ranges from 0.2 m to 0.8 m, with a mean of 0.4 m.
  • Figure 3: The Xt28 concept forwarder with the Cranab FC12 crane mounted. The semi-transparent blue boxes show the simplified grapple geometry.
  • Figure 4: Evaluation curves during training of the selected agent, showing reward, lesson number, and smoothed reward using a sliding window of size 10. (a) shows training with two logs and restricted radius range. (b,c) shows training with 2--5 logs, with the grey regions highlighting the final lesson with non-simplified task. The lesson number maps to the difficulty parameter $d$, as described in Section \ref{['sec:curriculum']}.
  • Figure 5: (a) shows an overall success of 95%, (b) shows what number of logs is grasped, and (c) shows success given how many logs are in the pile.
  • ...and 5 more figures