Table of Contents
Fetching ...

Weakly Supervised Ephemeral Gully Detection In Remote Sensing Images Using Vision Language Models

Seyed Mohamad Ali Tousi, Ramy Farag, John A. Lory, G. N. DeSouza

TL;DR

Ephemeral gullies in agricultural fields are difficult to detect due to their transient nature and limited labeled data. The authors introduce a weakly supervised pipeline that uses Vision Language Models as labeling functions, a Snorkel-style label model to fuse noisy labels into probabilistic pseudo-labels, and a noise-aware student trained on these labels, aided by a semi-supervised dataset of over 18,000 locations. They release a large unlabeled dataset and a labeled evaluation set, enabling robust benchmarking of semi-supervised approaches for remote-sensing EG detection. Experimental results show the weakly supervised student outperforms individual VLMs and the label model, achieving improved accuracy and negative predictive value, with code and data publicly available to support practical adoption.

Abstract

Among soil erosion problems, Ephemeral Gullies are one of the most concerning phenomena occurring in agricultural fields. Their short temporal cycles increase the difficulty in automatically detecting them using classical computer vision approaches and remote sensing. Also, due to scarcity of and the difficulty in producing accurate labeled data, automatic detection of ephemeral gullies using Machine Learning is limited to zero-shot approaches which are hard to implement. To overcome these challenges, we present the first weakly supervised pipeline for detection of ephemeral gullies. Our method relies on remote sensing and uses Vision Language Models (VLMs) to drastically reduce the labor-intensive task of manual labeling. In order to achieve that, the method exploits: 1) the knowledge embedded in the VLM's pretraining; 2) a teacher-student model where the teacher learns from noisy labels coming from the VLMs, and the student learns by weak supervision using teacher-generate labels and a noise-aware loss function. We also make available the first-of-its-kind dataset for semi-supervised detection of ephemeral gully from remote-sensed images. The dataset consists of a number of locations labeled by a group of soil and plant scientists, as well as a large number of unlabeled locations. The dataset represent more than 18,000 high-resolution remote-sensing images obtained over the course of 13 years. Our experimental results demonstrate the validity of our approach by showing superior performances compared to VLMs and the label model itself when using weak supervision to train an student model. The code and dataset for this work are made publicly available.

Weakly Supervised Ephemeral Gully Detection In Remote Sensing Images Using Vision Language Models

TL;DR

Ephemeral gullies in agricultural fields are difficult to detect due to their transient nature and limited labeled data. The authors introduce a weakly supervised pipeline that uses Vision Language Models as labeling functions, a Snorkel-style label model to fuse noisy labels into probabilistic pseudo-labels, and a noise-aware student trained on these labels, aided by a semi-supervised dataset of over 18,000 locations. They release a large unlabeled dataset and a labeled evaluation set, enabling robust benchmarking of semi-supervised approaches for remote-sensing EG detection. Experimental results show the weakly supervised student outperforms individual VLMs and the label model, achieving improved accuracy and negative predictive value, with code and data publicly available to support practical adoption.

Abstract

Among soil erosion problems, Ephemeral Gullies are one of the most concerning phenomena occurring in agricultural fields. Their short temporal cycles increase the difficulty in automatically detecting them using classical computer vision approaches and remote sensing. Also, due to scarcity of and the difficulty in producing accurate labeled data, automatic detection of ephemeral gullies using Machine Learning is limited to zero-shot approaches which are hard to implement. To overcome these challenges, we present the first weakly supervised pipeline for detection of ephemeral gullies. Our method relies on remote sensing and uses Vision Language Models (VLMs) to drastically reduce the labor-intensive task of manual labeling. In order to achieve that, the method exploits: 1) the knowledge embedded in the VLM's pretraining; 2) a teacher-student model where the teacher learns from noisy labels coming from the VLMs, and the student learns by weak supervision using teacher-generate labels and a noise-aware loss function. We also make available the first-of-its-kind dataset for semi-supervised detection of ephemeral gully from remote-sensed images. The dataset consists of a number of locations labeled by a group of soil and plant scientists, as well as a large number of unlabeled locations. The dataset represent more than 18,000 high-resolution remote-sensing images obtained over the course of 13 years. Our experimental results demonstrate the validity of our approach by showing superior performances compared to VLMs and the label model itself when using weak supervision to train an student model. The code and dataset for this work are made publicly available.

Paper Structure

This paper contains 21 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: We propose to use a combination of Vision Language Models (VLMs) and Weak Supervision Frameworks (WSF) to train a classifier that detects the presence of Ephemeral Gullies (EGs) in agricultural fields.
  • Figure 2: The proposed EG detection pipeline. The remote sensing RGB images are fed to the VLMs (Qwen2.5-VL and Llama3.2-Vision bai2023qwenmeta2024llama) with two different prompting paradigms: 1) single question, and 2) multi-questions. The resulted noisy labels produced by the VLMs are being used to train a probabilistic label model (Snorkel ratner2017snorkel). The trained label model then produce pseudo-labels which will be used in training an student model.
  • Figure 3: The proposed student model uses FlexiViT beyer2023flexivit as its feature extractor backbone. The patch sizes are chosen specifically to represent the same geographical area in both low and high resolution images. An MLP aggregates the extracted features and provides the final classification.