Table of Contents
Fetching ...

MaizeEar-SAM: Zero-Shot Maize Ear Phenotyping

Hossein Zaremehrjerdi, Lisa Coffey, Talukder Jubery, Huyu Liu, Jon Turkus, Kyle Linders, James C. Schnable, Patrick S. Schnable, Baskar Ganapathysubramanian

TL;DR

MaizeEar-SAM tackles the labor-intensive measurement of maize yield components by automating kernels-per-row counting with a zero-shot segmentation framework. The method employs the Segment Anything Model (SAM) for kernel masking and a graph-theoretic shortest-path approach to identify an in-ear kernel row, combined with a multi-path averaging strategy to improve robustness. The contributions include an annotation-free workflow, formalizing kernels-per-row through a graph-based definition and releasing open-source code; evaluated on the High-Intensity Phenotyping Sites (HIPS) dataset with sub-second to second-level per-ear timing on a high-end GPU, enabling thousands of ears phenotyped per day. This work reduces subjectivity in trait measurement, supports scalable data collection for GWAS and breeding, and broadens accessibility of frugal, high-throughput phenotyping.

Abstract

Quantifying the variation in yield component traits of maize (Zea mays L.), which together determine the overall productivity of this globally important crop, plays a critical role in plant genetics research, plant breeding, and the development of improved farming practices. Grain yield per acre is calculated by multiplying the number of plants per acre, ears per plant, number of kernels per ear, and the average kernel weight. The number of kernels per ear is determined by the number of kernel rows per ear multiplied by the number of kernels per row. Traditional manual methods for measuring these two traits are time-consuming, limiting large-scale data collection. Recent automation efforts using image processing and deep learning encounter challenges such as high annotation costs and uncertain generalizability. We tackle these issues by exploring Large Vision Models for zero-shot, annotation-free maize kernel segmentation. By using an open-source large vision model, the Segment Anything Model (SAM), we segment individual kernels in RGB images of maize ears and apply a graph-based algorithm to calculate the number of kernels per row. Our approach successfully identifies the number of kernels per row across a wide range of maize ears, showing the potential of zero-shot learning with foundation vision models combined with image processing techniques to improve automation and reduce subjectivity in agronomic data collection. All our code is open-sourced to make these affordable phenotyping methods accessible to everyone.

MaizeEar-SAM: Zero-Shot Maize Ear Phenotyping

TL;DR

MaizeEar-SAM tackles the labor-intensive measurement of maize yield components by automating kernels-per-row counting with a zero-shot segmentation framework. The method employs the Segment Anything Model (SAM) for kernel masking and a graph-theoretic shortest-path approach to identify an in-ear kernel row, combined with a multi-path averaging strategy to improve robustness. The contributions include an annotation-free workflow, formalizing kernels-per-row through a graph-based definition and releasing open-source code; evaluated on the High-Intensity Phenotyping Sites (HIPS) dataset with sub-second to second-level per-ear timing on a high-end GPU, enabling thousands of ears phenotyped per day. This work reduces subjectivity in trait measurement, supports scalable data collection for GWAS and breeding, and broadens accessibility of frugal, high-throughput phenotyping.

Abstract

Quantifying the variation in yield component traits of maize (Zea mays L.), which together determine the overall productivity of this globally important crop, plays a critical role in plant genetics research, plant breeding, and the development of improved farming practices. Grain yield per acre is calculated by multiplying the number of plants per acre, ears per plant, number of kernels per ear, and the average kernel weight. The number of kernels per ear is determined by the number of kernel rows per ear multiplied by the number of kernels per row. Traditional manual methods for measuring these two traits are time-consuming, limiting large-scale data collection. Recent automation efforts using image processing and deep learning encounter challenges such as high annotation costs and uncertain generalizability. We tackle these issues by exploring Large Vision Models for zero-shot, annotation-free maize kernel segmentation. By using an open-source large vision model, the Segment Anything Model (SAM), we segment individual kernels in RGB images of maize ears and apply a graph-based algorithm to calculate the number of kernels per row. Our approach successfully identifies the number of kernels per row across a wide range of maize ears, showing the potential of zero-shot learning with foundation vision models combined with image processing techniques to improve automation and reduce subjectivity in agronomic data collection. All our code is open-sourced to make these affordable phenotyping methods accessible to everyone.

Paper Structure

This paper contains 19 sections, 1 equation, 11 figures, 1 table, 2 algorithms.

Figures (11)

  • Figure 1: These images depict a sample image from the HIPS dataset. It includes 4 hybrid ears, each labeled with its ID beneath it. In the top right corner, there are QR codes corresponding to each tray of images. All ears in each tray are associated with the same genotype.
  • Figure 2: This image illustrates two distinct path types: the green path, generated by our automated pipeline, and the black path, generated by a human expert. Additionally, red dots indicate invalid kernels deemed immature or unhealthy by the expert, green dots represent valid kernels that should be counted by the model, and black dots denote kernels counted by the expert.
  • Figure 3: This figure demonstrates our approach to counting the kernels-per-row on maize ears, beginning with raw images that typically contain four to six ears. The first step involves extracting individual ears and reading the QR codes from each tray, resulting in images of separated ears for further processing. Next, we apply the Segment Anything Model (SAM) to these isolated ears to generate masks and bounding boxes for each kernel. Following this, in our post-processing step, we refine these masks and compute the center points of each kernel. Finally, utilizing graph theory, we frame our analysis as a graph problem to identify the most representative row and count the kernels, focusing on fully developed kernels to ascertain the number of kernels-per-row.
  • Figure 4: Three paths: In the left image, the black dotted line indicates the central path that splits the ear into two halves. The middle and right images respectively illustrate the detailed paths of the left and right halves, as derived from the initial segmentation
  • Figure 5: Comparative analysis of path selection and kernel counting. (a) This plot demonstrates MaizeEar-SAM's performance in counting kernels, including maturity-based kernel filtering. This is compared with a scenario where MaizeEar-SAM selects the path, but an expert performs the kernel counting, allowing for a nuanced evaluation of model accuracy versus expert judgment. (b) This plot contrasts the outcomes where, on one side, the path is selected by MaizeEar-SAM and counted by an expert, and on the other side, both path selection and counting are conducted by an expert, underscoring the differences in subjectivity between automated and manual approaches.
  • ...and 6 more figures