PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Tianqi Wei; Zhi Chen; Xin Yu; Scott Chapman; Paul Melloy; Zi Huang

PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Tianqi Wei, Zhi Chen, Xin Yu, Scott Chapman, Paul Melloy, Zi Huang

TL;DR

This dataset is extensive, featuring 11,400 images with disease segmentation masks and an additional 8,000 healthy plant images categorized by plant type, which allows researchers to evaluate their image classification methods and provides a critical foundation for developing and benchmarking advanced plant disease segmentation algorithms.

Abstract

Plant diseases pose significant threats to agriculture. It necessitates proper diagnosis and effective treatment to safeguard crop yields. To automate the diagnosis process, image segmentation is usually adopted for precisely identifying diseased regions, thereby advancing precision agriculture. Developing robust image segmentation models for plant diseases demands high-quality annotations across numerous images. However, existing plant disease datasets typically lack segmentation labels and are often confined to controlled laboratory settings, which do not adequately reflect the complexity of natural environments. Motivated by this fact, we established PlantSeg, a large-scale segmentation dataset for plant diseases. PlantSeg distinguishes itself from existing datasets in three key aspects. (1) Annotation type: Unlike the majority of existing datasets that only contain class labels or bounding boxes, each image in PlantSeg includes detailed and high-quality segmentation masks, associated with plant types and disease names. (2) Image source: Unlike typical datasets that contain images from laboratory settings, PlantSeg primarily comprises in-the-wild plant disease images. This choice enhances the practical applicability, as the trained models can be applied for integrated disease management. (3) Scale: PlantSeg is extensive, featuring 11,400 images with disease segmentation masks and an additional 8,000 healthy plant images categorized by plant type. Extensive technical experiments validate the high quality of PlantSeg's annotations. This dataset not only allows researchers to evaluate their image classification methods but also provides a critical foundation for developing and benchmarking advanced plant disease segmentation algorithms.

PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

TL;DR

Abstract

Paper Structure (14 sections, 1 equation, 9 figures, 3 tables)

This paper contains 14 sections, 1 equation, 9 figures, 3 tables.

Background & Summary
Methods
Data Records
Technical Validation
Code availability
Author contributions statement
Competing interests

Figures (9)

Figure 1: Examples of images of PlantVillage plantvillage and our dataset. As collected in laboratory environments, each image in PlantVillage only contains one leaf and has a uniform background, while images of our dataset feature much more complex backgrounds, various viewpoints, and different lighting conditions.
Figure 2: Locations of the source image acquired. The sizes of the plots represent the number of acquired images. The size of each circle demonstrates the number of images acquired from the address, and the color depth indicates the density of addresses within a nearby region.
Figure 3: Examples of images with annotated polygons on the disease-affected areas.
Figure 4: The curation process of the PlantSeg dataset involves three main steps: image acquisition, data cleaning, and annotation. In the image acquisition stage, images were collected from various internet sources using identified keywords and then stored according to their categories. During the data cleaning phase, incorrect images were identified and removed. For the segmentation annotation process, annotators utilized LabelMe labelme to annotate the cleaned images. These annotations were subsequently reviewed by experts and saved in JSON files.
Figure 5: Disease distribution in PlantSeg according to plants and Socioeconomic classification. The height of the bars represents the number of diseases associated with each plant.
...and 4 more figures

PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

TL;DR

Abstract

PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)