Table of Contents
Fetching ...

DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

Taeyeong Choi, Dario Guevara, Zifei Cheng, Grisha Bandodkar, Chonghan Wang, Brian N. Bailey, Mason Earles, Xin Liu

TL;DR

DAVIS-Ag addresses the absence of a standardized benchmark for domain-inspired active vision in agriculture by delivering a large-scale synthetic dataset with densely connected viewpoints around realistic plant scenes. Built with AgML and Helios, it covers 632 virtual orchards across three crop types, two scale settings, and includes rich labels (bounding boxes, instance masks) plus view-to-view action pointers to enable active viewpoint planning. Baseline and RL experiments demonstrate that learned active vision policies can substantially improve target visibility, and transfer experiments indicate usefulness for real-world prototyping. The dataset aims to standardize benchmarking, accelerate domain-specific active vision research in outdoor agricultural settings, and support tasks like fruit detection, yield estimation, and navigation in occluded environments.

Abstract

In agricultural environments, viewpoint planning can be a critical functionality for a robot with visual sensors to obtain informative observations of objects of interest (e.g., fruits) from complex structures of plant with random occlusions. Although recent studies on active vision have shown some potential for agricultural tasks, each model has been designed and validated on a unique environment that would not easily be replicated for benchmarking novel methods being developed later. In this paper, we introduce a dataset, so-called DAVIS-Ag, for promoting more extensive research on Domain-inspired Active VISion in Agriculture. To be specific, we leveraged our open-source "AgML" framework and 3D plant simulator of "Helios" to produce 502K RGB images from 30K densely sampled spatial locations in 632 synthetic orchards. Moreover, plant environments of strawberries, tomatoes, and grapes are considered at two different scales (i.e., Single-Plant and Multi-Plant). Useful labels are also provided for each image, including (1) bounding boxes and (2) instance segmentation masks for all identifiable fruits, and also (3) pointers to other images of the viewpoints that are reachable by an execution of action so as to simulate active viewpoint selections at each time step. Using DAVIS-Ag, we visualize motivating examples where fruit visibility can dramatically change depending on the pose of the camera view primarily due to occlusions by other components, such as leaves. Furthermore, we present several baseline models with experiment results for benchmarking in the task of target visibility maximization. Transferability to real strawberry environments is also investigated to demonstrate the feasibility of using the dataset for prototyping real-world solutions. For future research, our dataset is made publicly available online: https://github.com/ctyeong/DAVIS-Ag.

DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

TL;DR

DAVIS-Ag addresses the absence of a standardized benchmark for domain-inspired active vision in agriculture by delivering a large-scale synthetic dataset with densely connected viewpoints around realistic plant scenes. Built with AgML and Helios, it covers 632 virtual orchards across three crop types, two scale settings, and includes rich labels (bounding boxes, instance masks) plus view-to-view action pointers to enable active viewpoint planning. Baseline and RL experiments demonstrate that learned active vision policies can substantially improve target visibility, and transfer experiments indicate usefulness for real-world prototyping. The dataset aims to standardize benchmarking, accelerate domain-specific active vision research in outdoor agricultural settings, and support tasks like fruit detection, yield estimation, and navigation in occluded environments.

Abstract

In agricultural environments, viewpoint planning can be a critical functionality for a robot with visual sensors to obtain informative observations of objects of interest (e.g., fruits) from complex structures of plant with random occlusions. Although recent studies on active vision have shown some potential for agricultural tasks, each model has been designed and validated on a unique environment that would not easily be replicated for benchmarking novel methods being developed later. In this paper, we introduce a dataset, so-called DAVIS-Ag, for promoting more extensive research on Domain-inspired Active VISion in Agriculture. To be specific, we leveraged our open-source "AgML" framework and 3D plant simulator of "Helios" to produce 502K RGB images from 30K densely sampled spatial locations in 632 synthetic orchards. Moreover, plant environments of strawberries, tomatoes, and grapes are considered at two different scales (i.e., Single-Plant and Multi-Plant). Useful labels are also provided for each image, including (1) bounding boxes and (2) instance segmentation masks for all identifiable fruits, and also (3) pointers to other images of the viewpoints that are reachable by an execution of action so as to simulate active viewpoint selections at each time step. Using DAVIS-Ag, we visualize motivating examples where fruit visibility can dramatically change depending on the pose of the camera view primarily due to occlusions by other components, such as leaves. Furthermore, we present several baseline models with experiment results for benchmarking in the task of target visibility maximization. Transferability to real strawberry environments is also investigated to demonstrate the feasibility of using the dataset for prototyping real-world solutions. For future research, our dataset is made publicly available online: https://github.com/ctyeong/DAVIS-Ag.
Paper Structure (20 sections, 1 equation, 6 figures, 5 tables)

This paper contains 20 sections, 1 equation, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Example images from single-plant scenarios of DAVIS-Ag: \ref{['fig:single_vine']}--\ref{['fig:single_vine_seg']} goblet vine, \ref{['fig:single_strawberry']} strawberry, and \ref{['fig:single_tomato']} tomato. Labels for fruits are also visualized with bounding boxes in \ref{['fig:single_strawberry']}--\ref{['fig:single_tomato']}, and instance segmentation of \ref{['fig:single_vine']} in \ref{['fig:single_vine_seg']}.
  • Figure 2: Colors represent the numbers of visible strawberries from different positions, visualized by circles, in an example environment where a single plant at the center has $24$ fruits in total. Heights of $0.50$m, $0.75$m, and $1.00$m were considered.
  • Figure 3: Histograms depicting \ref{['fig:motiv_diff']} the differences in the numbers of strawberries observed at various altitudes and \ref{['fig:motiv_novel']} the numbers of newly found instances at higher altitudes. $H_k$ denotes the set of observable fruits at a particular altitude $z_k$ at fixed locations within the example of \ref{['fig:motiv_spatial']}, where $z_1 < z_2 < z_3$.
  • Figure 4: \ref{['fig:learn_curve']} Episodic rewards over $300$K interactions while A$2$C learns the TVM task from SP-Strawberry data, with some level of rolling average applied for clarity. Frequencies of \ref{['fig:oer_dist']} OER scores for all available strawberry instances and \ref{['fig:sp_st_acts_dist']} selected actions from tests.
  • Figure 5: Action choices in $100$ representative test episodes in SP-Strawberry farms. Actions $0$ to $6$ correspond to forward, backward, left, right, up, down, and done, respectively.
  • ...and 1 more figures