Table of Contents
Fetching ...

Scalable Vision-Guided Crop Yield Estimation

Harrison H. Li, Medhanie Irgau, Nabil Janmohamed, Karen Solveig Rieckmann, David B. Lobell

TL;DR

The paper tackles the challenge of accurate zone-level crop yield estimation under limited ground-truth data by blending expensive crop cuts with scalable field photos through Prediction-Powered Inference (PPI). It develops the PPI++ estimator that learns a low-dimensional control function to recalibrate photo-based predictions using field coordinates, achieving asymptotically unbiased estimates and potentially reduced variance relative to crop cuts alone. Empirical evidence on rice and maize in sub-Saharan Africa shows substantial finite-sample gains in effective sample size and narrower confidence intervals, outperforming photo-naive and AIPW baselines while preserving coverage. The approach enables lower-cost crop insurance analytics and can extend to other estimands or data sources like satellites, with pooling across zones offering practical finite-sample benefits.

Abstract

Precise estimation and uncertainty quantification for average crop yields are critical for agricultural monitoring and decision making. Existing data collection methods, such as crop cuts in randomly sampled fields at harvest time, are relatively time-consuming. Thus, we propose an approach based on prediction-powered inference (PPI) to supplement these crop cuts with less time-consuming field photos. After training a computer vision model to predict the ground truth crop cut yields from the photos, we learn a ``control function" that recalibrates these predictions with the spatial coordinates of each field. This enables fields with photos but not crop cuts to be leveraged to improve the precision of zone-wide average yield estimates. Our control function is learned by training on a dataset of nearly 20,000 real crop cuts and photos of rice and maize fields in sub-Saharan Africa. To improve precision, we pool training observations across different zones within the same first-level subdivision of each country. Our final PPI-based point estimates of the average yield are provably asymptotically unbiased and cannot increase the asymptotic variance beyond that of the natural baseline estimator -- the sample average of the crop cuts -- as the number of fields grows. We also propose a novel bias-corrected and accelerated (BCa) bootstrap to construct accompanying confidence intervals. Even in zones with as few as 20 fields, the point estimates show significant empirical improvement over the baseline, increasing the effective sample size by as much as 73% for rice and by 12-23% for maize. The confidence intervals are accordingly shorter at minimal cost to empirical finite-sample coverage. This demonstrates the potential for relatively low-cost images to make area-based crop insurance more affordable and thus spur investment into sustainable agricultural practices.

Scalable Vision-Guided Crop Yield Estimation

TL;DR

The paper tackles the challenge of accurate zone-level crop yield estimation under limited ground-truth data by blending expensive crop cuts with scalable field photos through Prediction-Powered Inference (PPI). It develops the PPI++ estimator that learns a low-dimensional control function to recalibrate photo-based predictions using field coordinates, achieving asymptotically unbiased estimates and potentially reduced variance relative to crop cuts alone. Empirical evidence on rice and maize in sub-Saharan Africa shows substantial finite-sample gains in effective sample size and narrower confidence intervals, outperforming photo-naive and AIPW baselines while preserving coverage. The approach enables lower-cost crop insurance analytics and can extend to other estimands or data sources like satellites, with pooling across zones offering practical finite-sample benefits.

Abstract

Precise estimation and uncertainty quantification for average crop yields are critical for agricultural monitoring and decision making. Existing data collection methods, such as crop cuts in randomly sampled fields at harvest time, are relatively time-consuming. Thus, we propose an approach based on prediction-powered inference (PPI) to supplement these crop cuts with less time-consuming field photos. After training a computer vision model to predict the ground truth crop cut yields from the photos, we learn a ``control function" that recalibrates these predictions with the spatial coordinates of each field. This enables fields with photos but not crop cuts to be leveraged to improve the precision of zone-wide average yield estimates. Our control function is learned by training on a dataset of nearly 20,000 real crop cuts and photos of rice and maize fields in sub-Saharan Africa. To improve precision, we pool training observations across different zones within the same first-level subdivision of each country. Our final PPI-based point estimates of the average yield are provably asymptotically unbiased and cannot increase the asymptotic variance beyond that of the natural baseline estimator -- the sample average of the crop cuts -- as the number of fields grows. We also propose a novel bias-corrected and accelerated (BCa) bootstrap to construct accompanying confidence intervals. Even in zones with as few as 20 fields, the point estimates show significant empirical improvement over the baseline, increasing the effective sample size by as much as 73% for rice and by 12-23% for maize. The confidence intervals are accordingly shorter at minimal cost to empirical finite-sample coverage. This demonstrates the potential for relatively low-cost images to make area-based crop insurance more affordable and thus spur investment into sustainable agricultural practices.

Paper Structure

This paper contains 13 sections, 1 theorem, 65 equations, 11 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

In the setting of Algorithm alg:full_pipeline, suppose that for each zone $j=1,\ldots,J$, the sample sizes $n_j$ and $N_j$ satisfy $n_j/N_j \rightarrow \rho_j \in (0,\infty)$ as $N_j \rightarrow \infty$, the labeled observations $\{(Y_{ji},V_{ji},X_{ji})\}_{i=1}^{n_j}$ are independent and identicall where $f_{r(j)}^*(W_{ji}) = (1,g(V_{ji}),X_{ji})\beta_{r(j)}$ and $\hat{\lambda}_j \stackrel{p}{\ri

Figures (11)

  • Figure 1: (Top row) Sample rice field photos from Nigeria in harvest year 2022. The fields have ground truth (crop cut) yields of $4.7$$\texttt{mt ha}^{-1}$ (left) and $0.0088$$\texttt{mt ha}^{-1}$ (right). (Bottom row) Sample maize field photos from Zimbabwe in harvest year 2024. The fields have ground truth (crop cut) yields of $9.1$$\texttt{mt ha}^{-1}$ (left) and $0.0061$$\texttt{mt ha}^{-1}$ (right).
  • Figure 2: Histograms of the number of fields within each zone in our dataset, separated by country and harvest year
  • Figure 3: Histograms of field-level yields, separated by country and harvest year. All observations to the left of the solid vertical lines at 0 are exact zeros.
  • Figure 4: The estimated "MSE-based" (left) and "CI-based" (right) relative efficiencies of the proposed method (ppipp) and other alternatives described in the text for each (country, harvest year) pair. The dots are point estimates for each relative efficiency, and computed as averages of squared error ratios (left) and squared CI width ratios (right) across all synthetic bootstrap datasets for all zones. The error bars are 95% BCa bootstrap confidence intervals for the true relative efficiencies, where the uncertainty stems from having only sampled a finite number of zones in each (country, harvest year) pair. The dashed vertical grey lines indicate the theoretical asymptotic relative efficiencies of the PPI++ estimator based on \ref{['eq:ppipp_rel_eff']} with $N_j/n_j=4$ and $R_j^2$ estimated in each zone by the squared Pearson correlations between the computer vision model predictions and ground truth yields.
  • Figure 5: The estimated effective sample size is plotted on each vertical axis against the actual zone size on the horizontal axis for each (country, harvest year) pair. Effective sample size is defined for each zone as the ratio of squared error (left) or squared ratio of CI width (right) from $\hat{\theta}_{\textnormal{lbl}}$ to that from $\hat{\theta}_{\textnormal{PPI++}}$, multiplied by the zone size. The blue lines are ordinary linear regression fits to the scatterplots and the dashed lines pass through the origin with slope 1.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof