Scalable Vision-Guided Crop Yield Estimation
Harrison H. Li, Medhanie Irgau, Nabil Janmohamed, Karen Solveig Rieckmann, David B. Lobell
TL;DR
The paper tackles the challenge of accurate zone-level crop yield estimation under limited ground-truth data by blending expensive crop cuts with scalable field photos through Prediction-Powered Inference (PPI). It develops the PPI++ estimator that learns a low-dimensional control function to recalibrate photo-based predictions using field coordinates, achieving asymptotically unbiased estimates and potentially reduced variance relative to crop cuts alone. Empirical evidence on rice and maize in sub-Saharan Africa shows substantial finite-sample gains in effective sample size and narrower confidence intervals, outperforming photo-naive and AIPW baselines while preserving coverage. The approach enables lower-cost crop insurance analytics and can extend to other estimands or data sources like satellites, with pooling across zones offering practical finite-sample benefits.
Abstract
Precise estimation and uncertainty quantification for average crop yields are critical for agricultural monitoring and decision making. Existing data collection methods, such as crop cuts in randomly sampled fields at harvest time, are relatively time-consuming. Thus, we propose an approach based on prediction-powered inference (PPI) to supplement these crop cuts with less time-consuming field photos. After training a computer vision model to predict the ground truth crop cut yields from the photos, we learn a ``control function" that recalibrates these predictions with the spatial coordinates of each field. This enables fields with photos but not crop cuts to be leveraged to improve the precision of zone-wide average yield estimates. Our control function is learned by training on a dataset of nearly 20,000 real crop cuts and photos of rice and maize fields in sub-Saharan Africa. To improve precision, we pool training observations across different zones within the same first-level subdivision of each country. Our final PPI-based point estimates of the average yield are provably asymptotically unbiased and cannot increase the asymptotic variance beyond that of the natural baseline estimator -- the sample average of the crop cuts -- as the number of fields grows. We also propose a novel bias-corrected and accelerated (BCa) bootstrap to construct accompanying confidence intervals. Even in zones with as few as 20 fields, the point estimates show significant empirical improvement over the baseline, increasing the effective sample size by as much as 73% for rice and by 12-23% for maize. The confidence intervals are accordingly shorter at minimal cost to empirical finite-sample coverage. This demonstrates the potential for relatively low-cost images to make area-based crop insurance more affordable and thus spur investment into sustainable agricultural practices.
