Table of Contents
Fetching ...

A Zero-Shot Learning Approach for Ephemeral Gully Detection from Remote Sensing using Vision Language Models

Seyed Mohamad Ali Tousi, Ramy Farag, Jacket Demby's, Gbenga Omotara, John A. Lory, G. N. DeSouza

TL;DR

Ephemeral gullies drive soil erosion and are hard to detect with limited labeled data. The authors propose three pipelines leveraging Vision-Language Models in a zero-shot regime, including a VQA+LLM reasoning path and a transfer-learning baseline, and release the first expert-labeled remote-sensing gully dataset. Across extensive experiments, zero-shot methods achieve strong detection performance (accuracy >70% and F1 near 0.8), with Qwen2-VL excelling in direct classification and Llama 3.2-Vision shining as a reasoning aggregator; optimizing the question set further improves results. The work provides practical, data-efficient approaches for remote-sensing gullies, highlights the tradeoffs between VLM visual understanding and LLM reasoning, and offers a public dataset to foster future research and benchmarking.

Abstract

Ephemeral gullies are a primary cause of soil erosion and their reliable, accurate, and early detection will facilitate significant improvements in the sustainability of global agricultural systems. In our view, prior research has not successfully addressed automated detection of ephemeral gullies from remotely sensed images, so for the first time, we present and evaluate three successful pipelines for ephemeral gully detection. Our pipelines utilize remotely sensed images, acquired from specific agricultural areas over a period of time. The pipelines were tested with various choices of Visual Language Models (VLMs), and they classified the images based on the presence of ephemeral gullies with accuracy higher than 70% and a F1-score close to 80% for positive gully detection. Additionally, we developed the first public dataset for ephemeral gully detection, labeled by a team of soil- and plant-science experts. To evaluate the proposed pipelines, we employed a variety of zero-shot classification methods based on State-of-the-Art (SOTA) open-source Vision-Language Models (VLMs). In addition to that, we compare the same pipelines with a transfer learning approach. Extensive experiments were conducted to validate the detection pipelines and to analyze the impact of hyperparameter changes in their performance. The experimental results demonstrate that the proposed zero-shot classification pipelines are highly effective in detecting ephemeral gullies in a scenario where classification datasets are scarce.

A Zero-Shot Learning Approach for Ephemeral Gully Detection from Remote Sensing using Vision Language Models

TL;DR

Ephemeral gullies drive soil erosion and are hard to detect with limited labeled data. The authors propose three pipelines leveraging Vision-Language Models in a zero-shot regime, including a VQA+LLM reasoning path and a transfer-learning baseline, and release the first expert-labeled remote-sensing gully dataset. Across extensive experiments, zero-shot methods achieve strong detection performance (accuracy >70% and F1 near 0.8), with Qwen2-VL excelling in direct classification and Llama 3.2-Vision shining as a reasoning aggregator; optimizing the question set further improves results. The work provides practical, data-efficient approaches for remote-sensing gullies, highlights the tradeoffs between VLM visual understanding and LLM reasoning, and offers a public dataset to foster future research and benchmarking.

Abstract

Ephemeral gullies are a primary cause of soil erosion and their reliable, accurate, and early detection will facilitate significant improvements in the sustainability of global agricultural systems. In our view, prior research has not successfully addressed automated detection of ephemeral gullies from remotely sensed images, so for the first time, we present and evaluate three successful pipelines for ephemeral gully detection. Our pipelines utilize remotely sensed images, acquired from specific agricultural areas over a period of time. The pipelines were tested with various choices of Visual Language Models (VLMs), and they classified the images based on the presence of ephemeral gullies with accuracy higher than 70% and a F1-score close to 80% for positive gully detection. Additionally, we developed the first public dataset for ephemeral gully detection, labeled by a team of soil- and plant-science experts. To evaluate the proposed pipelines, we employed a variety of zero-shot classification methods based on State-of-the-Art (SOTA) open-source Vision-Language Models (VLMs). In addition to that, we compare the same pipelines with a transfer learning approach. Extensive experiments were conducted to validate the detection pipelines and to analyze the impact of hyperparameter changes in their performance. The experimental results demonstrate that the proposed zero-shot classification pipelines are highly effective in detecting ephemeral gullies in a scenario where classification datasets are scarce.

Paper Structure

This paper contains 25 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The proposed independent pipelines to detect ephemeral gullies. We are exploring three types of Pipelines; (A): Feeding the remote sensing RGB images directly to SOTA zero-shot classification methods such as CLIP radford2021learning, CuPL pratt2023does, and a variety of VLMs including Llama 3.2-Vision touvron2023llama, Qwen bai2023qwen, and Llava liu2023visual. (B): Inspired by toubal2024modeling, we pass the RGB images through a VQA system, which responds to a series of Yes/No questions regarding visual attributes indicative of ephemeral gullies. An LLM then aggregates these responses to classify the images. (C): Similar to (B), we propose asking the VLM a single descriptive question about the image, with an LLM interpreting the response to make a final classification.
  • Figure 2: Analysis of responses across our body of 19 questions obtained with Llama3.2-90b VLM on the test set.
  • Figure 3: Analysis of responses across our body of 15 questions obtained with Qwen2-VL-72b VLM on the test set.