Table of Contents
Fetching ...

Remote sensing colour image semantic segmentation of trails created by large herbivorous Mammals

Jose Francisco Diez-Pastor, Francisco Javier Gonzalez-Moya, Pedro Latorre-Carmona, Francisco Javier Perez-Barbería, Ludmila I. Kuncheva, Antonio Canepa-Oneto, Alvar Arnaiz-González, Cesar Garcia-Osorio

TL;DR

The study tackles the problem of mapping grazing trails created by large herbivores from high-resolution aerial imagery to support biodiversity monitoring. It systematically evaluates five semantic segmentation architectures paired with 14 encoders, using a 10-fold cross-validation on 100 annotated images, with groundtruth masks generated in the HSI space. The key finding is that UNet with the MambaOut encoder delivers the best pixel-level detection performance (IoU and F1), enabling precise reconstruction of trail networks and differentiation from anthropogenic features. This pixel-level approach facilitates temporal monitoring and GIS-based habitat management, representing a significant advance over prior patch-based methods and enabling more robust herbivory assessments across landscapes.

Abstract

Identifying spatial regions where biodiversity is threatened is crucial for effective ecosystem conservation and monitoring. In this stydy, we assessed varios machine learning methods to detect grazing trails automatically. We tested five semantic segmentation models combined with 14 different encoder networks. The best combination was UNet with MambaOut encoder. The solution proposed could be used as the basis for tools aiming at mapping and tracking changes in grazing trails on a continuous temporal basis.

Remote sensing colour image semantic segmentation of trails created by large herbivorous Mammals

TL;DR

The study tackles the problem of mapping grazing trails created by large herbivores from high-resolution aerial imagery to support biodiversity monitoring. It systematically evaluates five semantic segmentation architectures paired with 14 encoders, using a 10-fold cross-validation on 100 annotated images, with groundtruth masks generated in the HSI space. The key finding is that UNet with the MambaOut encoder delivers the best pixel-level detection performance (IoU and F1), enabling precise reconstruction of trail networks and differentiation from anthropogenic features. This pixel-level approach facilitates temporal monitoring and GIS-based habitat management, representing a significant advance over prior patch-based methods and enabling more robust herbivory assessments across landscapes.

Abstract

Identifying spatial regions where biodiversity is threatened is crucial for effective ecosystem conservation and monitoring. In this stydy, we assessed varios machine learning methods to detect grazing trails automatically. We tested five semantic segmentation models combined with 14 different encoder networks. The best combination was UNet with MambaOut encoder. The solution proposed could be used as the basis for tools aiming at mapping and tracking changes in grazing trails on a continuous temporal basis.

Paper Structure

This paper contains 9 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Study area. Map of Spain showing the areas (red triangles) images were obtained from. The selected mountain systems correspond to: (i) "Cantabrian" mountain range, (ii) "Palencia" mountain range, (iii) "Pyrenees", (iv) "Sierra de la Demanda", and (v) "Sierra de Béjar".
  • Figure 2: Examples of different grazing trails networks in the "Cantabrian" mountain range. (A) Parallel trails on heather, their layout dominant direction is perpendicular to the slope, 43° 1'37.14" N, 5°30'4.25" O WGS84; (B) As (A) but on a rocky slope, coordinates 43°11'17.56" N, 4°45'24.56" O; (C) Trails on mountain grassland, no dominant layout direction, coordinates 43° 2'25.66" N, 6°13'20.43" O; (D) Trails on heather, two dominant directions, coordinates 43° 1'35.50" N, 5°29'40.07" O. Some trails are marked with arrows.
  • Figure 3: Example of the original RGB groundtruth image (left columns) and the result of the automatic trail segmentation (right column). The trails are highlighted in orange. Best viewed in colour.
  • Figure 4: Heatmap of Intersection over Union (IoU) performance across all encoder-architecture combinations. Each cell displays the mean IoU for a specific pair, with underlined values indicating the best encoder for each architecture and bold values showing the overall top combination. The row above the heatmap matrix presents architecture rankings (averaged across encoders), while the column at the left shows encoder rankings (averaged across architectures), where lower values indicate better performance. Throughout the figure, darker green shading consistently represents better performance: in the main matrix this corresponds to higher (better) IoU values, while in the rankings it indicates lower (better) average rank values.
  • Figure 5: Heatmap of the $F_{1}$ performance measure across all encoder-architecture combinations. Each cell displays the mean IoU for a specific pair, with underlined values indicating the best encoder for each architecture and bold values showing the overall top combination. The row above the heatmap matrix presents architecture rankings (averaged across encoders), while the column at the left shows encoder rankings (averaged across architectures), where lower values indicate better performance. Throughout the figure, darker green shading consistently represents better performance: in the main matrix this corresponds to higher (better) $F_1$ values, while in the rankings it indicates lower (better) average rank values.
  • ...and 1 more figures