Table of Contents
Fetching ...

Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting

Daiwei Zhang, Joaquin Gajardo, Tomislav Medic, Isinsu Katircioglu, Mike Boss, Norbert Kirchgessner, Achim Walter, Lukas Roth

TL;DR

Wheat3DGS introduces in-field 3D reconstruction and 3D wheat head instance segmentation by combining explicit 3D Gaussian Splatting (3DGS) with Segment Anything Model (SAM). The pipeline uses YOLOv5 to propose 2D wheat head boxes, SAM to generate masks, and an ILP- and IoU-based multi-view association to lift those masks to 3D, enabling per-head trait extraction (length $L$, width $W$, volume $V$) from canopy reconstructions. Compared to NeRF-based and traditional MVS approaches, Wheat3DGS achieves superior novel view synthesis quality and more accurate head segmentation, with 3D trait estimates showing moderate to strong genotype discriminability when aggregated at the row level; TLS data provide a robust reference for validation. The work enables scalable, non-destructive phenotyping in field conditions and includes a dataset with RGB images, camera poses, laser scans, and view-consistent segmentation masks to support future HTFP research.

Abstract

Automated extraction of plant morphological traits is crucial for supporting crop breeding and agricultural management through high-throughput field phenotyping (HTFP). Solutions based on multi-view RGB images are attractive due to their scalability and affordability, enabling volumetric measurements that 2D approaches cannot directly capture. While advanced methods like Neural Radiance Fields (NeRFs) have shown promise, their application has been limited to counting or extracting traits from only a few plants or organs. Furthermore, accurately measuring complex structures like individual wheat heads-essential for studying crop yields-remains particularly challenging due to occlusions and the dense arrangement of crop canopies in field conditions. The recent development of 3D Gaussian Splatting (3DGS) offers a promising alternative for HTFP due to its high-quality reconstructions and explicit point-based representation. In this paper, we present Wheat3DGS, a novel approach that leverages 3DGS and the Segment Anything Model (SAM) for precise 3D instance segmentation and morphological measurement of hundreds of wheat heads automatically, representing the first application of 3DGS to HTFP. We validate the accuracy of wheat head extraction against high-resolution laser scan data, obtaining per-instance mean absolute percentage errors of 15.1%, 18.3%, and 40.2% for length, width, and volume. We provide additional comparisons to NeRF-based approaches and traditional Muti-View Stereo (MVS), demonstrating superior results. Our approach enables rapid, non-destructive measurements of key yield-related traits at scale, with significant implications for accelerating crop breeding and improving our understanding of wheat development.

Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting

TL;DR

Wheat3DGS introduces in-field 3D reconstruction and 3D wheat head instance segmentation by combining explicit 3D Gaussian Splatting (3DGS) with Segment Anything Model (SAM). The pipeline uses YOLOv5 to propose 2D wheat head boxes, SAM to generate masks, and an ILP- and IoU-based multi-view association to lift those masks to 3D, enabling per-head trait extraction (length , width , volume ) from canopy reconstructions. Compared to NeRF-based and traditional MVS approaches, Wheat3DGS achieves superior novel view synthesis quality and more accurate head segmentation, with 3D trait estimates showing moderate to strong genotype discriminability when aggregated at the row level; TLS data provide a robust reference for validation. The work enables scalable, non-destructive phenotyping in field conditions and includes a dataset with RGB images, camera poses, laser scans, and view-consistent segmentation masks to support future HTFP research.

Abstract

Automated extraction of plant morphological traits is crucial for supporting crop breeding and agricultural management through high-throughput field phenotyping (HTFP). Solutions based on multi-view RGB images are attractive due to their scalability and affordability, enabling volumetric measurements that 2D approaches cannot directly capture. While advanced methods like Neural Radiance Fields (NeRFs) have shown promise, their application has been limited to counting or extracting traits from only a few plants or organs. Furthermore, accurately measuring complex structures like individual wheat heads-essential for studying crop yields-remains particularly challenging due to occlusions and the dense arrangement of crop canopies in field conditions. The recent development of 3D Gaussian Splatting (3DGS) offers a promising alternative for HTFP due to its high-quality reconstructions and explicit point-based representation. In this paper, we present Wheat3DGS, a novel approach that leverages 3DGS and the Segment Anything Model (SAM) for precise 3D instance segmentation and morphological measurement of hundreds of wheat heads automatically, representing the first application of 3DGS to HTFP. We validate the accuracy of wheat head extraction against high-resolution laser scan data, obtaining per-instance mean absolute percentage errors of 15.1%, 18.3%, and 40.2% for length, width, and volume. We provide additional comparisons to NeRF-based approaches and traditional Muti-View Stereo (MVS), demonstrating superior results. Our approach enables rapid, non-destructive measurements of key yield-related traits at scale, with significant implications for accelerating crop breeding and improving our understanding of wheat development.

Paper Structure

This paper contains 24 sections, 2 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: 3D Gaussian Splatting reconstruction of a wheat plot with segmented 3D wheat heads instances (in different colors). We use 30 views for reconstruction (red frustums) and 6 out-of-distribution views for evaluation (green frustums).
  • Figure 2: Overview of our pipeline: Given a set of RGB images capturing our target wheat field plot with a system of overhead cameras, we extract 2D segmentation masks of detected wheat heads (\ref{['sec:segmentation_masks']}) and reconstruct a 3D representation of the plot using 3D Gaussian Splatting (\ref{['sec:3D_reconstruction']}) as initialization. For robust 3D wheat head segmentation (\ref{['sec:3D_wheat_segmentation']}), we propose a match-and-fine-tune strategy (\ref{['sec:multi-view-asociation']}) that iteratively associates collections of decoupled masks and refines the 3D Gaussian representation of each segmented wheat head by alternating between lifting 2D masks to 3D and projecting 3D segmentations back to other views.
  • Figure 3: Comparison of Nerfacto and 3DGS* (gsplat implementation) renderings from a test view. Matching zoom regions (1-3) below each image highlight structural details.
  • Figure 4: Left: visualization of 2D detection and segmentation of wheat heads on a train image from (a) pre-trained YOLOv5, (b) Segment Anything with detected bounding boxes as prompt, and (c) state-of-the-art GroundedSAM2 version which combines Grounding DINO 1.5 with SAHI (Slicing Aided Hyper Inference), with "wheat head" as text prompt. Right: qualitative evaluation of novel view mask rendering from the 3D segmentation obtained by our pipeline (using (b)). (d) and (e) compare the projection of 3D instance segmentation onto 2D in a novel view with human-labeled wheat head segmentation. Green represents the overlap between rendered masks and ground truth (GT), i.e. correct segmentation of wheat head; orange indicates false positive segmentation; and red represents wheat heads not identified in the 3D instance segmentation, resulting in their absence in the projected 2D masks.
  • Figure 5: Corresponding wheat head instances of one experimental plot extracted from all three datasets.
  • ...and 5 more figures