Table of Contents
Fetching ...

FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework

Lukas Meyer, Andreas Gilson, Ute Schmid, Marc Stamminger

TL;DR

FruitNeRF, a unified novel fruit counting framework that leverages state-of-the-art view synthesis methods to count any fruit type directly in 3D directly in 3D, is introduced.

Abstract

We introduce FruitNeRF, a unified novel fruit counting framework that leverages state-of-the-art view synthesis methods to count any fruit type directly in 3D. Our framework takes an unordered set of posed images captured by a monocular camera and segments fruit in each image. To make our system independent of the fruit type, we employ a foundation model that generates binary segmentation masks for any fruit. Utilizing both modalities, RGB and semantic, we train a semantic neural radiance field. Through uniform volume sampling of the implicit Fruit Field, we obtain fruit-only point clouds. By applying cascaded clustering on the extracted point cloud, our approach achieves precise fruit count.The use of neural radiance fields provides significant advantages over conventional methods such as object tracking or optical flow, as the counting itself is lifted into 3D. Our method prevents double counting fruit and avoids counting irrelevant fruit.We evaluate our methodology using both real-world and synthetic datasets. The real-world dataset consists of three apple trees with manually counted ground truths, a benchmark apple dataset with one row and ground truth fruit location, while the synthetic dataset comprises various fruit types including apple, plum, lemon, pear, peach, and mango.Additionally, we assess the performance of fruit counting using the foundation model compared to a U-Net.

FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework

TL;DR

FruitNeRF, a unified novel fruit counting framework that leverages state-of-the-art view synthesis methods to count any fruit type directly in 3D directly in 3D, is introduced.

Abstract

We introduce FruitNeRF, a unified novel fruit counting framework that leverages state-of-the-art view synthesis methods to count any fruit type directly in 3D. Our framework takes an unordered set of posed images captured by a monocular camera and segments fruit in each image. To make our system independent of the fruit type, we employ a foundation model that generates binary segmentation masks for any fruit. Utilizing both modalities, RGB and semantic, we train a semantic neural radiance field. Through uniform volume sampling of the implicit Fruit Field, we obtain fruit-only point clouds. By applying cascaded clustering on the extracted point cloud, our approach achieves precise fruit count.The use of neural radiance fields provides significant advantages over conventional methods such as object tracking or optical flow, as the counting itself is lifted into 3D. Our method prevents double counting fruit and avoids counting irrelevant fruit.We evaluate our methodology using both real-world and synthetic datasets. The real-world dataset consists of three apple trees with manually counted ground truths, a benchmark apple dataset with one row and ground truth fruit location, while the synthetic dataset comprises various fruit types including apple, plum, lemon, pear, peach, and mango.Additionally, we assess the performance of fruit counting using the foundation model compared to a U-Net.
Paper Structure (24 sections, 6 equations, 8 figures, 1 table)

This paper contains 24 sections, 6 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Rendering of an apple tree from our real-world dataset generated with FruitNeRF. On the right: a zoomed-in region with corresponding semantic logits (top) and the appearance rendering (bottom).
  • Figure 2: Pipeline of our proposed fruit counting method - FruitNeRF. Data Preparation (Sec. \ref{['sec:datapreparation']}) uses structure from motion (SfM) schoenberger2016sfm to recover both intrinsic and extrinsic camera parameters. We then extract semantic masks for arbitrary fruit types (Sec. \ref{['ssec:fruit_segmentation']}) using the foundation model SAM SAM and a self-trained U-Net unet for apples only. The posed RGB and semantic images are used to train a semantic neural radiance field. FruitNeRF (Sec. \ref{['sec:fruit_nerf']}) encodes the appearance of the scene including the semantic information. By sampling the appearance and Fruit Field (Sec. \ref{['sec:nerf_sampling']}) uniformly, a dense point cloud is obtained. This paves the way for selecting only the 3D fruit points and clustering them to achieve a precise fruit count (Sec. \ref{['sec:fruit_count']}).
  • Figure 3: Visualization of data points along the pipeline. In (a) RGB and semantic rendering are depicted. (b) shows the extracted tree point cloud from the density and the Appearance field (left) and the fruit point cloud, a combination of the Density and the Fruit Field (right).
  • Figure 4: Second clustering stage: For multi-fruit cluster with three fruits, we simultaneously compute several cluster sizes of the fruit point cloud (greenish dots). Each computed cluster center $c$ (magenta dots) serves as the center point for our template fruit (smaller black dots). The minimum Hausdorff distance between the template point cloud and the fruit point cloud determines the number of clusters.
  • Figure 5: Visualization of the synthetic data rendered with Blender (a) and the three apple trees of our real-world dataset (b).
  • ...and 3 more figures