Table of Contents
Fetching ...

StrideNET: Swin Transformer for Terrain Recognition with Dynamic Roughness Extraction

Maitreya Shelare, Neha Shigvan, Atharva Satam, Poonam Sonar

TL;DR

StrideNET introduces a dual-branch Swin Transformer framework to perform terrain recognition and dynamic roughness extraction from remote-sensing images. The Terrain Recognition branch leverages hierarchical, shifted-window self-attention to classify grassy, marshy, sandy, and rocky terrains, while the Roughness Extraction branch uses statistical texture analysis to compute a global roughness score and produce interpretive maps. On a 45k-image, four-class dataset, StrideNET achieves over 99% accuracy across classes and outperforms CNN and transformer baselines, highlighting its robust generalization and efficiency. The approach supports practical deployments in environmental monitoring, land-use classification, disaster response, and precision agriculture, offering both high accuracy terrain labels and texture-based surface property estimates.

Abstract

The field of remote-sensing image classification has seen immense progress with the rise of convolutional neural networks, and more recently, through vision transformers. These models, with their self-attention mechanism, can effectively capture global relationships and long-range dependencies between the image patches, in contrast with traditional convolutional models. This paper introduces StrideNET, a dual-branch transformer-based model developed for terrain recognition and surface roughness extraction. The terrain recognition branch employs the Swin Transformer to classify varied terrains by leveraging its capability to capture both local and global features. Complementing this, the roughness extraction branch utilizes a statistical texture-feature analysis technique to dynamically extract important land surface properties such as roughness and slipperiness. The model was trained on a custom dataset consisting of four terrain classes - grassy, marshy, sandy, and rocky, and it outperforms benchmark CNN and transformer based models, by achieving an average test accuracy of over 99 % across all classes. The applications of this work extend to different domains such as environmental monitoring, land use and cover classification, disaster response and precision agriculture.

StrideNET: Swin Transformer for Terrain Recognition with Dynamic Roughness Extraction

TL;DR

StrideNET introduces a dual-branch Swin Transformer framework to perform terrain recognition and dynamic roughness extraction from remote-sensing images. The Terrain Recognition branch leverages hierarchical, shifted-window self-attention to classify grassy, marshy, sandy, and rocky terrains, while the Roughness Extraction branch uses statistical texture analysis to compute a global roughness score and produce interpretive maps. On a 45k-image, four-class dataset, StrideNET achieves over 99% accuracy across classes and outperforms CNN and transformer baselines, highlighting its robust generalization and efficiency. The approach supports practical deployments in environmental monitoring, land-use classification, disaster response, and precision agriculture, offering both high accuracy terrain labels and texture-based surface property estimates.

Abstract

The field of remote-sensing image classification has seen immense progress with the rise of convolutional neural networks, and more recently, through vision transformers. These models, with their self-attention mechanism, can effectively capture global relationships and long-range dependencies between the image patches, in contrast with traditional convolutional models. This paper introduces StrideNET, a dual-branch transformer-based model developed for terrain recognition and surface roughness extraction. The terrain recognition branch employs the Swin Transformer to classify varied terrains by leveraging its capability to capture both local and global features. Complementing this, the roughness extraction branch utilizes a statistical texture-feature analysis technique to dynamically extract important land surface properties such as roughness and slipperiness. The model was trained on a custom dataset consisting of four terrain classes - grassy, marshy, sandy, and rocky, and it outperforms benchmark CNN and transformer based models, by achieving an average test accuracy of over 99 % across all classes. The applications of this work extend to different domains such as environmental monitoring, land use and cover classification, disaster response and precision agriculture.
Paper Structure (13 sections, 6 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 13 sections, 6 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: StrideNET Architecture
  • Figure 2: Terrain Recognition branch
  • Figure 3: Terrain Dataset Aras_2023
  • Figure 4: Roughness Extraction
  • Figure 5: Graph representing model accuracy and model loss for training and validation set of proposed StrideNET model.