Table of Contents
Fetching ...

Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks

Sudhir Sornapudi, Rajhans Singh

TL;DR

This work explores how self-supervised representation learning unlocks the potential applicability to diverse agriculture vision tasks by eliminating the need for large-scale annotated datasets, and proposes a lightweight framework utilizing SimCLR, a contrastive learning approach, to pre-train a ResNet-50 backbone on a large, unannotated dataset of real-world agriculture field images.

Abstract

Computer vision in agriculture is game-changing with its ability to transform farming into a data-driven, precise, and sustainable industry. Deep learning has empowered agriculture vision to analyze vast, complex visual data, but heavily rely on the availability of large annotated datasets. This remains a bottleneck as manual labeling is error-prone, time-consuming, and expensive. The lack of efficient labeling approaches inspired us to consider self-supervised learning as a paradigm shift, learning meaningful feature representations from raw agricultural image data. In this work, we explore how self-supervised representation learning unlocks the potential applicability to diverse agriculture vision tasks by eliminating the need for large-scale annotated datasets. We propose a lightweight framework utilizing SimCLR, a contrastive learning approach, to pre-train a ResNet-50 backbone on a large, unannotated dataset of real-world agriculture field images. Our experimental analysis and results indicate that the model learns robust features applicable to a broad range of downstream agriculture tasks discussed in the paper. Additionally, the reduced reliance on annotated data makes our approach more cost-effective and accessible, paving the way for broader adoption of computer vision in agriculture.

Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks

TL;DR

This work explores how self-supervised representation learning unlocks the potential applicability to diverse agriculture vision tasks by eliminating the need for large-scale annotated datasets, and proposes a lightweight framework utilizing SimCLR, a contrastive learning approach, to pre-train a ResNet-50 backbone on a large, unannotated dataset of real-world agriculture field images.

Abstract

Computer vision in agriculture is game-changing with its ability to transform farming into a data-driven, precise, and sustainable industry. Deep learning has empowered agriculture vision to analyze vast, complex visual data, but heavily rely on the availability of large annotated datasets. This remains a bottleneck as manual labeling is error-prone, time-consuming, and expensive. The lack of efficient labeling approaches inspired us to consider self-supervised learning as a paradigm shift, learning meaningful feature representations from raw agricultural image data. In this work, we explore how self-supervised representation learning unlocks the potential applicability to diverse agriculture vision tasks by eliminating the need for large-scale annotated datasets. We propose a lightweight framework utilizing SimCLR, a contrastive learning approach, to pre-train a ResNet-50 backbone on a large, unannotated dataset of real-world agriculture field images. Our experimental analysis and results indicate that the model learns robust features applicable to a broad range of downstream agriculture tasks discussed in the paper. Additionally, the reduced reliance on annotated data makes our approach more cost-effective and accessible, paving the way for broader adoption of computer vision in agriculture.
Paper Structure (10 sections, 1 equation, 10 figures, 3 tables)

This paper contains 10 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Wide range of applications based on representations learned from self-supervised modeling.
  • Figure 2: Illustration of the two-stage process: self-supervised pre-training by contrasting different views of the same image and label-light supervised fine-tuning for downstream tasks.
  • Figure 3: Corteva real-world unlabeled agriculture image data.
  • Figure 4: 3D point cloud visualization of hold-out Corteva test data feature representations with color-coded ground-truth labels.
  • Figure 5: Instance segmentation and detection result from Mask R-CNN model. Finetuned model from Self-supervised pretrained ResNet-50 weights (left) and ground-truth annotation (right).
  • ...and 5 more figures