Table of Contents
Fetching ...

How to rewrite the stars: Mapping your orchard over time through constellations of fruits

Gonçalo P. Matos, Carlos Santiago, João P. Costeira, Ricardo L. Saldanha, Ernesto M. Morgado

TL;DR

Problem: re-identifying the same fruits across time-lapse orchard videos to track growth, despite non-rigid growth and appearance changes. Approach: a constellation-based paradigm using 3D fruit centroids and the STaR-i descriptor, invariant to translation, rotation, and scale, to match small constellations across scenes with unknown pose and scale. Contributions: (1) STaR-i for very sparse 3D points; (2) a complete pipeline for cross-video re-identification, local registration via $RANSAC$, and map-based localization in $6$ DoF; (3) validation on synthetic and real orchard data, including long-term scenarios. Significance: enables autonomous navigation and selective fruit picking in orchards, reducing manual measurement and enabling scalable precision agriculture.

Abstract

Following crop growth through the vegetative cycle allows farmers to predict fruit setting and yield in early stages, but it is a laborious and non-scalable task if performed by a human who has to manually measure fruit sizes with a caliper or dendrometers. In recent years, computer vision has been used to automate several tasks in precision agriculture, such as detecting and counting fruits, and estimating their size. However, the fundamental problem of matching the exact same fruits from one video, collected on a given date, to the fruits visible in another video, collected on a later date, which is needed to track fruits' growth through time, remains to be solved. Few attempts were made, but they either assume that the camera always starts from the same known position and that there are sufficiently distinct features to match, or they used other sources of data like GPS. Here we propose a new paradigm to tackle this problem, based on constellations of 3D centroids, and introduce a descriptor for very sparse 3D point clouds that can be used to match fruits across videos. Matching constellations instead of individual fruits is key to deal with non-rigidity, occlusions and challenging imagery with few distinct visual features to track. The results show that the proposed method can be successfully used to match fruits across videos and through time, and also to build an orchard map and later use it to locate the camera pose in 6DoF, thus providing a method for autonomous navigation of robots in the orchard and for selective fruit picking, for example.

How to rewrite the stars: Mapping your orchard over time through constellations of fruits

TL;DR

Problem: re-identifying the same fruits across time-lapse orchard videos to track growth, despite non-rigid growth and appearance changes. Approach: a constellation-based paradigm using 3D fruit centroids and the STaR-i descriptor, invariant to translation, rotation, and scale, to match small constellations across scenes with unknown pose and scale. Contributions: (1) STaR-i for very sparse 3D points; (2) a complete pipeline for cross-video re-identification, local registration via , and map-based localization in DoF; (3) validation on synthetic and real orchard data, including long-term scenarios. Significance: enables autonomous navigation and selective fruit picking in orchards, reducing manual measurement and enabling scalable precision agriculture.

Abstract

Following crop growth through the vegetative cycle allows farmers to predict fruit setting and yield in early stages, but it is a laborious and non-scalable task if performed by a human who has to manually measure fruit sizes with a caliper or dendrometers. In recent years, computer vision has been used to automate several tasks in precision agriculture, such as detecting and counting fruits, and estimating their size. However, the fundamental problem of matching the exact same fruits from one video, collected on a given date, to the fruits visible in another video, collected on a later date, which is needed to track fruits' growth through time, remains to be solved. Few attempts were made, but they either assume that the camera always starts from the same known position and that there are sufficiently distinct features to match, or they used other sources of data like GPS. Here we propose a new paradigm to tackle this problem, based on constellations of 3D centroids, and introduce a descriptor for very sparse 3D point clouds that can be used to match fruits across videos. Matching constellations instead of individual fruits is key to deal with non-rigidity, occlusions and challenging imagery with few distinct visual features to track. The results show that the proposed method can be successfully used to match fruits across videos and through time, and also to build an orchard map and later use it to locate the camera pose in 6DoF, thus providing a method for autonomous navigation of robots in the orchard and for selective fruit picking, for example.
Paper Structure (14 sections, 3 equations, 8 figures, 3 tables)

This paper contains 14 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of the proposed method. (A) Fruits are tracked throughout the image sequence in stereo 3D. (B) Constellations of 3D fruit centroids are matched using the proposed descriptor. (C) Fruits are re-identified across videos.
  • Figure 2: Construction of a geometric descriptor for a quad of 3D "stars". (Left) Defining the coordinate system. (Middle) A side view of the scene. The plane $ABC$ is represented in blue, and vector $\overrightarrow{v}$ in orange. (Right) A top view of the scene. The yellow plane is the plane defined by the line $AB$ and the Z-axis.
  • Figure 3: Registration of two complete semantic point clouds corresponding to two videos of the same trees. Matched fruits do not overlap perfectly.
  • Figure 4: Distance of the transformed 3D points to their original counterparts, as a function of the percentage of occlusions, without noise.
  • Figure 5: Distance of the transformed 3D points to their original counterparts, as a function of the percentage of occlusions and standard deviation of the noise.
  • ...and 3 more figures