Table of Contents
Fetching ...

NeFF-BioNet: Crop Biomass Prediction from Point Cloud to Drone Imagery

Xuesong Li, Zeeshan Hayder, Ali Zia, Connor Cassidy, Shiming Liu, Warwick Stiller, Eric Stone, Warren Conaty, Lars Petersson, Vivien Rolland

TL;DR

The BioNet, utilizing a sparse 3D convolutional neural network (CNN) and a transformer-based prediction module, processes point clouds and other 3D data representations to predict biomass, designed for adaptation across different data modalities, including point clouds and drone imagery.

Abstract

Crop biomass offers crucial insights into plant health and yield, making it essential for crop science, farming systems, and agricultural research. However, current measurement methods, which are labor-intensive, destructive, and imprecise, hinder large-scale quantification of this trait. To address this limitation, we present a biomass prediction network (BioNet), designed for adaptation across different data modalities, including point clouds and drone imagery. Our BioNet, utilizing a sparse 3D convolutional neural network (CNN) and a transformer-based prediction module, processes point clouds and other 3D data representations to predict biomass. To further extend BioNet for drone imagery, we integrate a neural feature field (NeFF) module, enabling 3D structure reconstruction and the transformation of 2D semantic features from vision foundation models into the corresponding 3D surfaces. For the point cloud modality, BioNet demonstrates superior performance on two public datasets, with an approximate 6.1% relative improvement (RI) over the state-of-the-art. In the RGB image modality, the combination of BioNet and NeFF achieves a 7.9% RI. Additionally, the NeFF-based approach utilizes inexpensive, portable drone-mounted cameras, providing a scalable solution for large field applications.

NeFF-BioNet: Crop Biomass Prediction from Point Cloud to Drone Imagery

TL;DR

The BioNet, utilizing a sparse 3D convolutional neural network (CNN) and a transformer-based prediction module, processes point clouds and other 3D data representations to predict biomass, designed for adaptation across different data modalities, including point clouds and drone imagery.

Abstract

Crop biomass offers crucial insights into plant health and yield, making it essential for crop science, farming systems, and agricultural research. However, current measurement methods, which are labor-intensive, destructive, and imprecise, hinder large-scale quantification of this trait. To address this limitation, we present a biomass prediction network (BioNet), designed for adaptation across different data modalities, including point clouds and drone imagery. Our BioNet, utilizing a sparse 3D convolutional neural network (CNN) and a transformer-based prediction module, processes point clouds and other 3D data representations to predict biomass. To further extend BioNet for drone imagery, we integrate a neural feature field (NeFF) module, enabling 3D structure reconstruction and the transformation of 2D semantic features from vision foundation models into the corresponding 3D surfaces. For the point cloud modality, BioNet demonstrates superior performance on two public datasets, with an approximate 6.1% relative improvement (RI) over the state-of-the-art. In the RGB image modality, the combination of BioNet and NeFF achieves a 7.9% RI. Additionally, the NeFF-based approach utilizes inexpensive, portable drone-mounted cameras, providing a scalable solution for large field applications.

Paper Structure

This paper contains 15 sections, 11 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overview of our biomass prediction network: Our BioNet (d), consisting of a sparse 3D CNN backbone network and a transformer encoder, takes point cloud or 3D semantic feature maps as input and predicts biomass (e). For point cloud modality, the point cloud can be directly piped into BioNet as 3D inputs (c). For RGB imagery modality, a collection of drone images (a) above crops are gathered and processed by our NeFF (b) to generate 3D feature maps from 2D images as 3D inputs (c).
  • Figure 2: The framework of our BioNet. Point clouds or 3D feature maps are first voxelized, and a sparse 3D CNN backbone then extracts 3D local structure features and compresses the height dimension into one. The final 2D feature map is flattened into a feature embedding, to which a learnable biomass embedding token (yellow cuboid) is appended. This biomass token acts as a comprehensive 3D feature representation and connects to a two-layer MLP for final biomass prediction. The feature embedding is added to a learnable position embedding before the Transformer encoder. The rightmost panel shows the structure of the Transformer encoder as inspired by vaswani2017attention.
  • Figure 3: Graph showing the biomass prediction error against the regularized error for different ground truth values.
  • Figure 4: The framework of our NeFF module. The neural geometry field $\mathcal{E}_{g}$ takes as input the sampled points $\mathbf{x}$ along camera ray $\mathbf{r}(t)$ (red arrow) and sparse points $\textbf{P}^{S}$ from SfM. The red cuboid ($\mathcal{E}_{g}(x)$) represents SDF output for points belonging to $\textbf{P}^{S}$. The neural feature field $\mathcal{E}_{f}$ predicts the 3D semantic features for points $\mathbf{x}$, and the radiance field $\mathcal{E}_{c}$ will predict the colors of sampled 3D points $\mathbf{x}$ with viewing direction $\mathbf{v}$. The novel-view image and its feature maps are rendered through the volumetric rendering function in NeuS NeuS_3D_reconstru. 3D feature maps extracted from feature field $\mathcal{E}_{f}$ are used for input to the BioNet, shown in \ref{['fig:bpn']}.
  • Figure 5: Adaptation capability on MMCBE. "PC" is when the method takes point cloud as 3D input, while "NeFF" indicates that 3D input is generated from the NeFF module.
  • ...and 4 more figures