Table of Contents
Fetching ...

RowDetr: End-to-End Crop Row Detection Using Polynomials

Rahul Harsha Cheppally, Ajay Sharda

TL;DR

RowDetr tackles GPS-denied under-canopy crop-row navigation by introducing a end-to-end transformer-based framework that represents rows as polynomials. It couples a lightweight backbone, a hybrid encoder, polynomial proposals, PolySampler, and a multi-scale deformable attention decoder, optimized with the PolyOptLoss energy-based objective and Hungarian one-to-one matching. The method achieves state-of-the-art accuracy (up to TuSimple F1 = 0.74) and low latency (as fast as 3.5 ms with INT8 on Jetson Orin AGX) across diverse crops and conditions, enabling robust edge deployment. The work demonstrates the practical impact of polynomial parameterization for real-time, occlusion-prone, and curved crop rows in autonomous agricultural robotics, with strong potential for deployment in GPS-denied field operations.

Abstract

Crop row detection enables autonomous robots to navigate in gps denied environments. Vision based strategies often struggle in the environments due to gaps, curved crop rows and require post-processing steps. Furthermore, labeling crop rows in under the canopy environments accurately is very difficult due to occlusions. This study introduces RowDetr, an efficient end-to-end transformer-based neural network for crop row detection in precision agriculture. RowDetr leverages a lightweight backbone and a hybrid encoder to model straight, curved, or occluded crop rows with high precision. Central to the architecture is a novel polynomial representation that enables direct parameterization of crop rows, eliminating computationally expensive post-processing. Key innovations include a PolySampler module and multi-scale deformable attention, which work together with PolyOptLoss, an energy-based loss function designed to optimize geometric alignment between predicted and the annotated crop rows, while also enhancing robustness against labeling noise. RowDetr was evaluated against other state-of-the-art end-to-end crop row detection methods like AgroNav and RolColAttention on a diverse dataset of 6,962 high-resolution images, used for training, validation, and testing across multiple crop types with annotated crop rows. The system demonstrated superior performance, achieved an F1 score up to 0.74 and a lane position deviation as low as 0.405. Furthermore, RowDetr achieves a real-time inference latency of 6.7ms, which was optimized to 3.5ms with INT8 quantization on an NVIDIA Jetson Orin AGX. This work highlighted the critical efficiency of polynomial parameterization, making RowDetr particularly suitable for deployment on edge computing devices in agricultural robotics and autonomous farming equipment. Index terms > Crop Row Detection, Under Canopy Navigation, Transformers, RT-DETR, RT-DETRv2

RowDetr: End-to-End Crop Row Detection Using Polynomials

TL;DR

RowDetr tackles GPS-denied under-canopy crop-row navigation by introducing a end-to-end transformer-based framework that represents rows as polynomials. It couples a lightweight backbone, a hybrid encoder, polynomial proposals, PolySampler, and a multi-scale deformable attention decoder, optimized with the PolyOptLoss energy-based objective and Hungarian one-to-one matching. The method achieves state-of-the-art accuracy (up to TuSimple F1 = 0.74) and low latency (as fast as 3.5 ms with INT8 on Jetson Orin AGX) across diverse crops and conditions, enabling robust edge deployment. The work demonstrates the practical impact of polynomial parameterization for real-time, occlusion-prone, and curved crop rows in autonomous agricultural robotics, with strong potential for deployment in GPS-denied field operations.

Abstract

Crop row detection enables autonomous robots to navigate in gps denied environments. Vision based strategies often struggle in the environments due to gaps, curved crop rows and require post-processing steps. Furthermore, labeling crop rows in under the canopy environments accurately is very difficult due to occlusions. This study introduces RowDetr, an efficient end-to-end transformer-based neural network for crop row detection in precision agriculture. RowDetr leverages a lightweight backbone and a hybrid encoder to model straight, curved, or occluded crop rows with high precision. Central to the architecture is a novel polynomial representation that enables direct parameterization of crop rows, eliminating computationally expensive post-processing. Key innovations include a PolySampler module and multi-scale deformable attention, which work together with PolyOptLoss, an energy-based loss function designed to optimize geometric alignment between predicted and the annotated crop rows, while also enhancing robustness against labeling noise. RowDetr was evaluated against other state-of-the-art end-to-end crop row detection methods like AgroNav and RolColAttention on a diverse dataset of 6,962 high-resolution images, used for training, validation, and testing across multiple crop types with annotated crop rows. The system demonstrated superior performance, achieved an F1 score up to 0.74 and a lane position deviation as low as 0.405. Furthermore, RowDetr achieves a real-time inference latency of 6.7ms, which was optimized to 3.5ms with INT8 quantization on an NVIDIA Jetson Orin AGX. This work highlighted the critical efficiency of polynomial parameterization, making RowDetr particularly suitable for deployment on edge computing devices in agricultural robotics and autonomous farming equipment. Index terms > Crop Row Detection, Under Canopy Navigation, Transformers, RT-DETR, RT-DETRv2

Paper Structure

This paper contains 33 sections, 8 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overview of the proposed architecture. The input image is first processed by a backbone to extract multi-scale features, which are then passed through a hybrid encoder. The encoder proposes a set of top queries, while denoising queries are also introduced to stabilize training. A polynomial sampler, followed by an offset network, refines the proposed queries, enabling structured sampling along polynomial paths. These refined queries are passed to a multi-scale deformable attention decoder, which produces final predictions. The image on the right visualizes the predicted curves (red, green, and blue), each corresponding to a distinct row hypothesis.
  • Figure 2: Hardware setup used for data collection. The under-the-canopy robot (left) captures high-resolution images for row detection, while the metering stick (right) provides accurate height measurements.
  • Figure 3: Representative samples from the dataset illustrating various challenging field conditions: canopy occlusion (a), row curvature (a), weed presence (c), and surface residue (b, and d); including presence of weeds within the rows (e).
  • Figure 4: Samples from the training batch (Augmented): labeled samples (Blue), positive predictions with confidence greater than 0.5 (Green), and non-existent row predictions with confidence less than 0.5 (White).
  • Figure 5: The top encoder queries (shown as red blocks) are projected onto the image space and sampled at equidistant intervals along polynomial trajectories denoted as $S_p$. These sampled points capture structured features from the encoder feature map visualized in the heatmap. An offset network then refines these sampled points, producing $N_p$, which are passed through the Multi-Scale Deformable Attention module. The right image visualizes the attention scores of $N_p$ across feature levels for their respective polynomial (same color), with the size of the markers indicating attention magnitude. The colored curves (green, blue, red) represent top encoder proposals, displayed without any ranking order.
  • ...and 6 more figures