Weakly Supervised Ephemeral Gully Detection In Remote Sensing Images Using Vision Language Models
Seyed Mohamad Ali Tousi, Ramy Farag, John A. Lory, G. N. DeSouza
TL;DR
Ephemeral gullies in agricultural fields are difficult to detect due to their transient nature and limited labeled data. The authors introduce a weakly supervised pipeline that uses Vision Language Models as labeling functions, a Snorkel-style label model to fuse noisy labels into probabilistic pseudo-labels, and a noise-aware student trained on these labels, aided by a semi-supervised dataset of over 18,000 locations. They release a large unlabeled dataset and a labeled evaluation set, enabling robust benchmarking of semi-supervised approaches for remote-sensing EG detection. Experimental results show the weakly supervised student outperforms individual VLMs and the label model, achieving improved accuracy and negative predictive value, with code and data publicly available to support practical adoption.
Abstract
Among soil erosion problems, Ephemeral Gullies are one of the most concerning phenomena occurring in agricultural fields. Their short temporal cycles increase the difficulty in automatically detecting them using classical computer vision approaches and remote sensing. Also, due to scarcity of and the difficulty in producing accurate labeled data, automatic detection of ephemeral gullies using Machine Learning is limited to zero-shot approaches which are hard to implement. To overcome these challenges, we present the first weakly supervised pipeline for detection of ephemeral gullies. Our method relies on remote sensing and uses Vision Language Models (VLMs) to drastically reduce the labor-intensive task of manual labeling. In order to achieve that, the method exploits: 1) the knowledge embedded in the VLM's pretraining; 2) a teacher-student model where the teacher learns from noisy labels coming from the VLMs, and the student learns by weak supervision using teacher-generate labels and a noise-aware loss function. We also make available the first-of-its-kind dataset for semi-supervised detection of ephemeral gully from remote-sensed images. The dataset consists of a number of locations labeled by a group of soil and plant scientists, as well as a large number of unlabeled locations. The dataset represent more than 18,000 high-resolution remote-sensing images obtained over the course of 13 years. Our experimental results demonstrate the validity of our approach by showing superior performances compared to VLMs and the label model itself when using weak supervision to train an student model. The code and dataset for this work are made publicly available.
