Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning

Isaac Ray; Alexei Skurikhin

Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning

Isaac Ray, Alexei Skurikhin

TL;DR

This article introduces a method for fully unsupervised whole-image clustering, specifically designed for massive datasets of remote-sensing scenes with no labels, outperforming state-of-the-art zero-shot classification techniques on multiple datasets.

Abstract

This paper proposes a method for unsupervised whole-image clustering of a target dataset of remote sensing scenes with no labels. The method consists of three main steps: (1) finetuning a pretrained deep neural network (DINOv2) on a labelled source remote sensing imagery dataset and using it to extract a feature vector from each image in the target dataset, (2) reducing the dimension of these deep features via manifold projection into a low-dimensional Euclidean space, and (3) clustering the embedded features using a Bayesian nonparametric technique to infer the number and membership of clusters simultaneously. The method takes advantage of heterogeneous transfer learning to cluster unseen data with different feature and label distributions. We demonstrate the performance of this approach outperforming state-of-the-art zero-shot classification methods on several remote sensing scene classification datasets.

Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 3 figures, 5 tables)

This paper contains 26 sections, 1 equation, 3 figures, 5 tables.

Introduction
Related Works
Deep Clustering
Remote Sensing
Method
Choosing a Feature Extractor
Finetuning
Choosing a Manifold Projection
Choosing a Clustering Algorithm
Experiments
Implementation
Finetuning
Manifold Projection
Clustering Algorithm
SATIN Benchmark: Land Use
...and 11 more sections

Figures (3)

Figure 1: Diagram of the proposed image clustering pipeline: given $n$ unlabelled inference images to cluster, we extract $d$-dimensional features from a DNN trained on a general image dataset and finetuned on a remote sensing dataset. We then project these features onto a much lower $p$-dimensional approximated manifold and apply our clustering algorithm to these lower dimensional features.
Figure 2: Visualisation of the embedded features for the Optimal-31 dataset when using the base DINOv2-L model as a feature extractor (left), and when using the DINOv2-L model finetuned on the RESISC45 dataset (right). The colours and labels denote the ground truth labels. As verified in \ref{['tbl:main']}, the features from the finetuned model are much easier to cluster.
Figure 3: Example images and their ground truth label from each of the SATIN Land Use task (T2) datasets. Aspects such as ground truth labels, image size, and spatial resolution (ground sampling distance) visually appear to be highly heterogeneous across different datasets.

Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning

TL;DR

Abstract

Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)