Table of Contents
Fetching ...

AgRowStitch: A High-fidelity Image Stitching Pipeline for Ground-based Agricultural Images

Isaac Kazuo Uyehara, Heesup Yun, Earl Ranario, Mason Earles

TL;DR

The paper tackles the difficulty of stitching ground-based agricultural images taken close to crops, where drift and parallax hinder traditional mosaic methods. It introduces an open-source, row-focused pipeline that stitches images in small batches with constraints on camera motion, using SuperPoint for features and LightGlue for matching, followed by OpenCV-based refinement and straightening. The approach achieves leaf-scale mosaics with roughly 20 cm mean absolute error over a 72 m row across three datasets, enabling coarse georeferencing without GPS or specialized hardware. This has practical impact for agronomists and plant phenotyping, providing accessible, high-resolution row mosaics when precise positioning data are unavailable.

Abstract

Agricultural imaging often requires individual images to be stitched together into a final mosaic for analysis. However, agricultural images can be particularly challenging to stitch because feature matching across images is difficult due to repeated textures, plants are non-planar, and mosaics built from many images can accumulate errors that cause drift. Although these issues can be mitigated by using georeferenced images or taking images at high altitude, there is no general solution for images taken close to the crop. To address this, we created a user-friendly and open source pipeline for stitching ground-based images of a linear row of crops that does not rely on additional data. First, we use SuperPoint and LightGlue to extract and match features within small batches of images. Then we stitch the images in each batch in series while imposing constraints on the camera movement. After straightening and rescaling each batch mosaic, all batch mosaics are stitched together in series and then straightened into a final mosaic. We tested the pipeline on images collected along 72 m long rows of crops using two different agricultural robots and a camera manually carried over the row. In all three cases, the pipeline produced high-quality mosaics that could be used to georeference real world positions with a mean absolute error of 20 cm. This approach provides accessible leaf-scale stitching to users who need to coarsely georeference positions within a row, but do not have access to accurate positional data or sophisticated imaging systems.

AgRowStitch: A High-fidelity Image Stitching Pipeline for Ground-based Agricultural Images

TL;DR

The paper tackles the difficulty of stitching ground-based agricultural images taken close to crops, where drift and parallax hinder traditional mosaic methods. It introduces an open-source, row-focused pipeline that stitches images in small batches with constraints on camera motion, using SuperPoint for features and LightGlue for matching, followed by OpenCV-based refinement and straightening. The approach achieves leaf-scale mosaics with roughly 20 cm mean absolute error over a 72 m row across three datasets, enabling coarse georeferencing without GPS or specialized hardware. This has practical impact for agronomists and plant phenotyping, providing accessible, high-resolution row mosaics when precise positioning data are unavailable.

Abstract

Agricultural imaging often requires individual images to be stitched together into a final mosaic for analysis. However, agricultural images can be particularly challenging to stitch because feature matching across images is difficult due to repeated textures, plants are non-planar, and mosaics built from many images can accumulate errors that cause drift. Although these issues can be mitigated by using georeferenced images or taking images at high altitude, there is no general solution for images taken close to the crop. To address this, we created a user-friendly and open source pipeline for stitching ground-based images of a linear row of crops that does not rely on additional data. First, we use SuperPoint and LightGlue to extract and match features within small batches of images. Then we stitch the images in each batch in series while imposing constraints on the camera movement. After straightening and rescaling each batch mosaic, all batch mosaics are stitched together in series and then straightened into a final mosaic. We tested the pipeline on images collected along 72 m long rows of crops using two different agricultural robots and a camera manually carried over the row. In all three cases, the pipeline produced high-quality mosaics that could be used to georeference real world positions with a mean absolute error of 20 cm. This approach provides accessible leaf-scale stitching to users who need to coarsely georeference positions within a row, but do not have access to accurate positional data or sophisticated imaging systems.

Paper Structure

This paper contains 12 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: An overview of the image and keypoint match selection algorithm. A) Keypoints are extracted using SuperPoint and then matched using LightGlue. B) Keypoint matches are filtered based on their position relative to the stitching edges of both images. C) A homography is estimated using RANSAC and outlier matches are removed. D) If the camera motion from the homography violates the camera movement assumptions, the match is rejected. E) If the homography was accepted, the remaining matches with the highest reprojection error are removed. F) If the reprojection error of the final matches is smaller than a threshold, the image match and the refined homography are accepted.
  • Figure 2: An overview of the stitching pipeline. A) Images are selected in series by attempting to match with more distant images first. When an image matches, the process repeats until a batch of images is complete. The next batch starts with the last image in the previous batch. B) Once the images in a batch have been selected, they are stitched in series. C) Each batch mosaic is sliced into quadrilateral sections and D) those sections are warped into rectangles of constant height. E) The straightened batch mosaics are rescaled and then stitched in series. F) Then the mosaic is straightened again to form G) the final mosaic.
  • Figure 3: Examples of the input images for the A) T4, B) Amiga, and C) monopod datasets as well as the final mosaics for the D) T4, E) Amiga, and F) monopod datasets. All of the mosaic represents a 72 m row that was scaled to a fixed length.
  • Figure 4: An example of the stitching quality of a single straightened batch mosaic from the A) T4, B) Amiga, and C) monopod datasets. Although there is noticeable blurring and ghosting in some regions of each mosaic, the overall quality is sufficient for object detection. Areas with ghosting are identified in red circles and enlarged to the right. Each batch mosaic is composed of ten images.
  • Figure 5: An example of the marker position discrepancies across the Amiga iPhone mosaics.
  • ...and 1 more figures