Table of Contents
Fetching ...

TreeFormers -- An Exploration of Vision Transformers for Deforestation Driver Classification

Uche Ochuba

TL;DR

This work addresses the problem of identifying drivers of deforestation from satellite imagery of Indonesian forests. It investigates vision transformers (ViTs) by fine-tuning a pre-trained model on a dataset of 332x332 composite images and evaluating rotation-based data augmentation, along with an attempted integration of longitudinal data. The Rotation ViT achieves the highest test accuracy of 72.9%, outperforming logistic regression and CNN baselines and approaching the performance of contemporary CNN-based methods, while longitudinal embedding offers limited benefits. The results demonstrate ViTs' strong potential for deforestation-driver classification and highlight the value of rotation-aware augmentation for imbalanced remote-sensing datasets, with future work exploring longitudinal integration and alternate pre-trained ViTs for further improvements.

Abstract

This paper addresses the critical issue of deforestation by exploring the application of vision transformers (ViTs) for classifying the drivers of deforestation using satellite imagery from Indonesian forests. Motivated by the urgency of this problem, I propose an approach that leverages ViTs and machine learning techniques. The input to my algorithm is a 332x332-pixel satellite image, and I employ a ViT architecture to predict the deforestation driver class; grassland shrubland, other, plantation, or smallholder agriculture. My methodology involves fine-tuning a pre-trained ViT on a dataset from the Stanford ML Group, and I experiment with rotational data augmentation techniques (among others) and embedding of longitudinal data to improve classification accuracy. I also tried training a ViT from scratch. Results indicate a significant improvement over baseline models, achieving a test accuracy of 72.9%. I conduct a comprehensive analysis, including error patterns and metrics, to highlight the strengths and limitations of my approach. This research contributes to the ongoing efforts to address deforestation challenges through advanced computer vision techniques.

TreeFormers -- An Exploration of Vision Transformers for Deforestation Driver Classification

TL;DR

This work addresses the problem of identifying drivers of deforestation from satellite imagery of Indonesian forests. It investigates vision transformers (ViTs) by fine-tuning a pre-trained model on a dataset of 332x332 composite images and evaluating rotation-based data augmentation, along with an attempted integration of longitudinal data. The Rotation ViT achieves the highest test accuracy of 72.9%, outperforming logistic regression and CNN baselines and approaching the performance of contemporary CNN-based methods, while longitudinal embedding offers limited benefits. The results demonstrate ViTs' strong potential for deforestation-driver classification and highlight the value of rotation-aware augmentation for imbalanced remote-sensing datasets, with future work exploring longitudinal integration and alternate pre-trained ViTs for further improvements.

Abstract

This paper addresses the critical issue of deforestation by exploring the application of vision transformers (ViTs) for classifying the drivers of deforestation using satellite imagery from Indonesian forests. Motivated by the urgency of this problem, I propose an approach that leverages ViTs and machine learning techniques. The input to my algorithm is a 332x332-pixel satellite image, and I employ a ViT architecture to predict the deforestation driver class; grassland shrubland, other, plantation, or smallholder agriculture. My methodology involves fine-tuning a pre-trained ViT on a dataset from the Stanford ML Group, and I experiment with rotational data augmentation techniques (among others) and embedding of longitudinal data to improve classification accuracy. I also tried training a ViT from scratch. Results indicate a significant improvement over baseline models, achieving a test accuracy of 72.9%. I conduct a comprehensive analysis, including error patterns and metrics, to highlight the strengths and limitations of my approach. This research contributes to the ongoing efforts to address deforestation challenges through advanced computer vision techniques.
Paper Structure (10 sections, 7 figures, 3 tables)

This paper contains 10 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Some sample images from the dataset, a visualization of compositing of images.
  • Figure 2: Visualizations of perspective change, color jitter, and flip/rotation transforms (Paszke et al., 2019).
  • Figure 3: Left to right: tSNE visualization of image embeddings, plot of longitude vs. latitude with class labels, sample images with colored bars to embed longitudinal data.
  • Figure 4: Left to right: The ViT architecture (Dosovitskiy et al., 2020), the proposed modified classification head architecture to incorporate longitudinal data into predictions.
  • Figure 5: Statistics over 300 eopchs for various models. The highest y-tick-marks for figures are 1.2, 1.4, and 0.8, from left to right.
  • ...and 2 more figures