Table of Contents
Fetching ...

SC-NeRF: NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications

Kibon Ku, Talukder Z Jubery, Elijah Rodriguez, Aditya Balu, Soumik Sarkar, Adarsh Krishnamurthy, Baskar Ganapathysubramanian

TL;DR

The paper tackles the challenge of high-throughput indoor plant 3D phenotyping with stationary cameras by introducing a NeRF-based reconstruction pipeline that simulates camera motion through COLMAP-derived poses and a simple pose transform. The method enables standard NeRF training on images captured as plants rotate on a pedestal, yielding high-fidelity 3D point clouds (up to $10$M points) with $F$-scores approaching $100.0\%$. Experimental validation across six plant objects demonstrates robust reconstruction quality and competitive overall runtimes, despite pose-estimation bottlenecks. The work includes a public dataset and highlights practical implications for integrating NeRF-based 3D phenotyping with expensive imaging modalities, such as hyperspectral cameras, in automated, high-throughput pipelines.

Abstract

This paper presents a NeRF-based framework for point cloud (PCD) reconstruction, specifically designed for indoor high-throughput plant phenotyping facilities. Traditional NeRF-based reconstruction methods require cameras to move around stationary objects, but this approach is impractical for high-throughput environments where objects are rapidly imaged while moving on conveyors or rotating pedestals. To address this limitation, we develop a variant of NeRF-based PCD reconstruction that uses a single stationary camera to capture images as the object rotates on a pedestal. Our workflow comprises COLMAP-based pose estimation, a straightforward pose transformation to simulate camera movement, and subsequent standard NeRF training. A defined Region of Interest (ROI) excludes irrelevant scene data, enabling the generation of high-resolution point clouds (10M points). Experimental results demonstrate excellent reconstruction fidelity, with precision-recall analyses yielding an F-score close to 100.00 across all evaluated plant objects. Although pose estimation remains computationally intensive with a stationary camera setup, overall training and reconstruction times are competitive, validating the method's feasibility for practical high-throughput indoor phenotyping applications. Our findings indicate that high-quality NeRF-based 3D reconstructions are achievable using a stationary camera, eliminating the need for complex camera motion or costly imaging equipment. This approach is especially beneficial when employing expensive and delicate instruments, such as hyperspectral cameras, for 3D plant phenotyping. Future work will focus on optimizing pose estimation techniques and further streamlining the methodology to facilitate seamless integration into automated, high-throughput 3D phenotyping pipelines.

SC-NeRF: NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications

TL;DR

The paper tackles the challenge of high-throughput indoor plant 3D phenotyping with stationary cameras by introducing a NeRF-based reconstruction pipeline that simulates camera motion through COLMAP-derived poses and a simple pose transform. The method enables standard NeRF training on images captured as plants rotate on a pedestal, yielding high-fidelity 3D point clouds (up to M points) with -scores approaching . Experimental validation across six plant objects demonstrates robust reconstruction quality and competitive overall runtimes, despite pose-estimation bottlenecks. The work includes a public dataset and highlights practical implications for integrating NeRF-based 3D phenotyping with expensive imaging modalities, such as hyperspectral cameras, in automated, high-throughput pipelines.

Abstract

This paper presents a NeRF-based framework for point cloud (PCD) reconstruction, specifically designed for indoor high-throughput plant phenotyping facilities. Traditional NeRF-based reconstruction methods require cameras to move around stationary objects, but this approach is impractical for high-throughput environments where objects are rapidly imaged while moving on conveyors or rotating pedestals. To address this limitation, we develop a variant of NeRF-based PCD reconstruction that uses a single stationary camera to capture images as the object rotates on a pedestal. Our workflow comprises COLMAP-based pose estimation, a straightforward pose transformation to simulate camera movement, and subsequent standard NeRF training. A defined Region of Interest (ROI) excludes irrelevant scene data, enabling the generation of high-resolution point clouds (10M points). Experimental results demonstrate excellent reconstruction fidelity, with precision-recall analyses yielding an F-score close to 100.00 across all evaluated plant objects. Although pose estimation remains computationally intensive with a stationary camera setup, overall training and reconstruction times are competitive, validating the method's feasibility for practical high-throughput indoor phenotyping applications. Our findings indicate that high-quality NeRF-based 3D reconstructions are achievable using a stationary camera, eliminating the need for complex camera motion or costly imaging equipment. This approach is especially beneficial when employing expensive and delicate instruments, such as hyperspectral cameras, for 3D plant phenotyping. Future work will focus on optimizing pose estimation techniques and further streamlining the methodology to facilitate seamless integration into automated, high-throughput 3D phenotyping pipelines.

Paper Structure

This paper contains 18 sections, 4 equations, 4 figures, 1 table, 2 algorithms.

Figures (4)

  • Figure 1: Schematic of the stationary camera imaging system for NeRF-based point cloud reconstruction in high-throughput plant phenotyping. In this setup, each plant is conveyed to a rotating turntable marked against a matte black background. Over a full 30-second rotation, a tripod-mounted stationary camera captures high-resolution images that serve as input for NeRF techniques to generate 3D reconstructions. This streamlined approach eliminates the need for complex moving-camera rigs, aligning with the objectives of efficient, scalable agricultural imaging. The right shows different PCD reconstruction using the stationary camera. (a) Apricot, (b) Banana, (c) Bell pepper, (d) Maize ear, (e) Crassula ovata, and (f) Haworthia sp.
  • Figure 2: Workflow of the NeRF-based 3D reconstruction pipeline. The process consists of three main steps: (A) Dataset Acquisition, where the experimental environment is set up, and multi-view image data is collected using a stationary camera; (B) Data Preprocessing, involving Keyframe extracion, pose estimation and camera calibration to ensure geometric consistency; and (C) NeRF-Based PCD, where a NeRF model is trained for scene representation, followed by PCD Reconstruction, Alignment, and Refinement to generate high-quality 3D point clouds. This structured approach improves the accuracy and scalability of 3D reconstruction for phenotyping and other agricultural vision applications.
  • Figure 3: Experimental setup. (A) Overall setup, where a stationary camera (iPhone 13 Mini) records a rotating object (green bell pepper) placed on a turntable against a black matte fabric to minimize background noise and improve segmentation. (B) Close-up of the turntable and object, highlighting the elevated platform and ArUco markers used for pose estimation and structured scene reconstruction. (C) ArUco markers for pose estimation, where different types of markers are used for feature matching in COLMAP to compute camera poses. (D) Scale calibration, where a ping pong ball (radius = 0.04 m) is measured with a caliper to ensure accurate scaling in the reconstructed point cloud data (PCD). This setup enables precise alignment between the stationary camera’s PCD measurements and the rotating camera’s ground-truth data for quantitative evaluation.
  • Figure 4: Precision-Recall Analysis for different objects based on varying threshold values. Each plot illustrates the relationship between precision (red) and recall (blue) across different thresholds, with the optimal threshold ($\epsilon$) marked by a black dashed line. The F-score for all objects is 100.00, indicating high reconstruction accuracy. The subfigures represent (A) Apricot ($\epsilon = 0.0110$), (B) Banana ($\epsilon = 0.0154$), (C) Bell pepper ($\epsilon = 0.0059$), (D) Maize ear ($\epsilon = 0.0122$), (E) Crassula ovata ($\epsilon = 0.0160$), and (F) Haworthia sp. ($\epsilon = 0.0188$). This comparison evaluates reconstruction accuracy by analyzing precision and recall behavior at various threshold levels.