Table of Contents
Fetching ...

High-throughput 3D shape completion of potato tubers on a harvester

Pieter M. Blok, Federico Magistri, Cyrill Stachniss, Haozhou Wang, James Burridge, Wei Guo

TL;DR

This work addresses the challenge of occluded potato tubers on harvesters by introducing CoRe++, an encoder–DeepSDF network that completes 3D tuber shapes from a single RGB‑D frame. Through careful preprocessing, data augmentation, and a two‑stage training regime, CoRe++ achieves a volumetric RMSE of $22.6$ ml and reduces it to $18.2$ ml when completion is performed in the image center, all at about $10$ ms per tuber, enabling real‑time, high‑throughput yield estimation. A public dataset of partial and complete tuber meshes, along with code and network weights, supports broader applications to other crops and real‑world deployment on harvesters. The results demonstrate robust generalization across tuber sizes and cultivars, with practical implications for precision agriculture and automated yield monitoring.

Abstract

Potato yield is an important metric for farmers to further optimize their cultivation practices. Potato yield can be estimated on a harvester using an RGB-D camera that can estimate the three-dimensional (3D) volume of individual potato tubers. A challenge, however, is that the 3D shape derived from RGB-D images is only partially completed, underestimating the actual volume. To address this issue, we developed a 3D shape completion network, called CoRe++, which can complete the 3D shape from RGB-D images. CoRe++ is a deep learning network that consists of a convolutional encoder and a decoder. The encoder compresses RGB-D images into latent vectors that are used by the decoder to complete the 3D shape using the deep signed distance field network (DeepSDF). To evaluate our CoRe++ network, we collected partial and complete 3D point clouds of 339 potato tubers on an operational harvester in Japan. On the 1425 RGB-D images in the test set (representing 51 unique potato tubers), our network achieved a completion accuracy of 2.8 mm on average. For volumetric estimation, the root mean squared error (RMSE) was 22.6 ml, and this was better than the RMSE of the linear regression (31.1 ml) and the base model (36.9 ml). We found that the RMSE can be further reduced to 18.2 ml when performing the 3D shape completion in the center of the RGB-D image. With an average 3D shape completion time of 10 milliseconds per tuber, we can conclude that CoRe++ is both fast and accurate enough to be implemented on an operational harvester for high-throughput potato yield estimation. CoRe++'s high-throughput and accurate processing allows it to be applied to other tuber, fruit and vegetable crops, thereby enabling versatile, accurate and real-time yield monitoring in precision agriculture. Our code, network weights and dataset are publicly available at https://github.com/UTokyo-FieldPhenomics-Lab/corepp.git.

High-throughput 3D shape completion of potato tubers on a harvester

TL;DR

This work addresses the challenge of occluded potato tubers on harvesters by introducing CoRe++, an encoder–DeepSDF network that completes 3D tuber shapes from a single RGB‑D frame. Through careful preprocessing, data augmentation, and a two‑stage training regime, CoRe++ achieves a volumetric RMSE of ml and reduces it to ml when completion is performed in the image center, all at about ms per tuber, enabling real‑time, high‑throughput yield estimation. A public dataset of partial and complete tuber meshes, along with code and network weights, supports broader applications to other crops and real‑world deployment on harvesters. The results demonstrate robust generalization across tuber sizes and cultivars, with practical implications for precision agriculture and automated yield monitoring.

Abstract

Potato yield is an important metric for farmers to further optimize their cultivation practices. Potato yield can be estimated on a harvester using an RGB-D camera that can estimate the three-dimensional (3D) volume of individual potato tubers. A challenge, however, is that the 3D shape derived from RGB-D images is only partially completed, underestimating the actual volume. To address this issue, we developed a 3D shape completion network, called CoRe++, which can complete the 3D shape from RGB-D images. CoRe++ is a deep learning network that consists of a convolutional encoder and a decoder. The encoder compresses RGB-D images into latent vectors that are used by the decoder to complete the 3D shape using the deep signed distance field network (DeepSDF). To evaluate our CoRe++ network, we collected partial and complete 3D point clouds of 339 potato tubers on an operational harvester in Japan. On the 1425 RGB-D images in the test set (representing 51 unique potato tubers), our network achieved a completion accuracy of 2.8 mm on average. For volumetric estimation, the root mean squared error (RMSE) was 22.6 ml, and this was better than the RMSE of the linear regression (31.1 ml) and the base model (36.9 ml). We found that the RMSE can be further reduced to 18.2 ml when performing the 3D shape completion in the center of the RGB-D image. With an average 3D shape completion time of 10 milliseconds per tuber, we can conclude that CoRe++ is both fast and accurate enough to be implemented on an operational harvester for high-throughput potato yield estimation. CoRe++'s high-throughput and accurate processing allows it to be applied to other tuber, fruit and vegetable crops, thereby enabling versatile, accurate and real-time yield monitoring in precision agriculture. Our code, network weights and dataset are publicly available at https://github.com/UTokyo-FieldPhenomics-Lab/corepp.git.
Paper Structure (22 sections, 8 equations, 11 figures, 6 tables)

This paper contains 22 sections, 8 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: (a) and (b) give overviews of the imaging system installed on a potato harvester in Sarabetsu, Japan. (c) Inside the imaging box, an RGB-D camera was installed, together with four LED strips that provided the necessary illumination inside the box. The sides of box were covered with a reflective curtain to generate diffuse lighting conditions.
  • Figure 2: Our potato collection method involved marking potato tubers with a colored thumbtack so that the tuber could be easily identified in the image and easily collected after image acquisition. (a), (b), and (c) show a tuber marked with a red thumbtack while it moved over the conveyor belt.
  • Figure 3: The workflow of our 3D reconstruction included three steps: (1) image collection, (2) image preprocessing, (3) 3D reconstruction with Structure-from-Motion (SfM).
  • Figure 4: 3D colored mesh of a potato tuber produced by our 3D reconstruction pipeline. (a) front view, (b) right side view, (c) back view, (d) left side view.
  • Figure 5: Kernel density estimate plots for visualizing the volumetric distribution by potato cultivar in the train, validation and test set.
  • ...and 6 more figures