A Zero-Shot Learning Approach for Ephemeral Gully Detection from Remote Sensing using Vision Language Models
Seyed Mohamad Ali Tousi, Ramy Farag, Jacket Demby's, Gbenga Omotara, John A. Lory, G. N. DeSouza
TL;DR
Ephemeral gullies drive soil erosion and are hard to detect with limited labeled data. The authors propose three pipelines leveraging Vision-Language Models in a zero-shot regime, including a VQA+LLM reasoning path and a transfer-learning baseline, and release the first expert-labeled remote-sensing gully dataset. Across extensive experiments, zero-shot methods achieve strong detection performance (accuracy >70% and F1 near 0.8), with Qwen2-VL excelling in direct classification and Llama 3.2-Vision shining as a reasoning aggregator; optimizing the question set further improves results. The work provides practical, data-efficient approaches for remote-sensing gullies, highlights the tradeoffs between VLM visual understanding and LLM reasoning, and offers a public dataset to foster future research and benchmarking.
Abstract
Ephemeral gullies are a primary cause of soil erosion and their reliable, accurate, and early detection will facilitate significant improvements in the sustainability of global agricultural systems. In our view, prior research has not successfully addressed automated detection of ephemeral gullies from remotely sensed images, so for the first time, we present and evaluate three successful pipelines for ephemeral gully detection. Our pipelines utilize remotely sensed images, acquired from specific agricultural areas over a period of time. The pipelines were tested with various choices of Visual Language Models (VLMs), and they classified the images based on the presence of ephemeral gullies with accuracy higher than 70% and a F1-score close to 80% for positive gully detection. Additionally, we developed the first public dataset for ephemeral gully detection, labeled by a team of soil- and plant-science experts. To evaluate the proposed pipelines, we employed a variety of zero-shot classification methods based on State-of-the-Art (SOTA) open-source Vision-Language Models (VLMs). In addition to that, we compare the same pipelines with a transfer learning approach. Extensive experiments were conducted to validate the detection pipelines and to analyze the impact of hyperparameter changes in their performance. The experimental results demonstrate that the proposed zero-shot classification pipelines are highly effective in detecting ephemeral gullies in a scenario where classification datasets are scarce.
