Table of Contents
Fetching ...

Leveraging Convolutional and Graph Networks for an Unsupervised Remote Sensing Labelling Tool

Tulsi Patel, Mark W. Jones, Thomas Redfern

TL;DR

The paper addresses the high cost of labeling remote sensing imagery by proposing an unsupervised, context-aware labelling pipeline that fuses CNN-based feature extraction with graph neural networks to encode spatial neighbourhoods. By segmenting images with SLIC, extracting rich activation maps via a U-Net, and propagating contextual information through a GNN, the method yields a robust, rotation-invariant embedding that is visualizable with UMAP for interactive labeling. The approach is validated through feature- and context-aware evaluations, U-Net benchmarking, and a tool-focused assessment, showing cohesive embeddings, improved labeling granularity, and practical interactive performance. This framework has significant potential to streamline large-scale RS dataset generation, enabling rapid, fine-grained labeling with reduced reliance on predefined categories or extensive expert input.

Abstract

Machine learning for remote sensing imaging relies on up-to-date and accurate labels for model training and testing. Labelling remote sensing imagery is time and cost intensive, requiring expert analysis. Previous labelling tools rely on pre-labelled data for training in order to label new unseen data. In this work, we define an unsupervised pipeline for finding and labelling geographical areas of similar context and content within Sentinel-2 satellite imagery. Our approach removes limitations of previous methods by utilising segmentation with convolutional and graph neural networks to encode a more robust feature space for image comparison. Unlike previous approaches we segment the image into homogeneous regions of pixels that are grouped based on colour and spatial similarity. Graph neural networks are used to aggregate information about the surrounding segments enabling the feature representation to encode the local neighbourhood whilst preserving its own local information. This reduces outliers in the labelling tool, allows users to label at a granular level, and allows a rotationally invariant semantic relationship at the image level to be formed within the encoding space. Our pipeline achieves high contextual consistency, with similarity scores of SSIM = 0.96 and SAM = 0.21 under context-aware evaluation, demonstrating robust organisation of the feature space for interactive labelling.

Leveraging Convolutional and Graph Networks for an Unsupervised Remote Sensing Labelling Tool

TL;DR

The paper addresses the high cost of labeling remote sensing imagery by proposing an unsupervised, context-aware labelling pipeline that fuses CNN-based feature extraction with graph neural networks to encode spatial neighbourhoods. By segmenting images with SLIC, extracting rich activation maps via a U-Net, and propagating contextual information through a GNN, the method yields a robust, rotation-invariant embedding that is visualizable with UMAP for interactive labeling. The approach is validated through feature- and context-aware evaluations, U-Net benchmarking, and a tool-focused assessment, showing cohesive embeddings, improved labeling granularity, and practical interactive performance. This framework has significant potential to streamline large-scale RS dataset generation, enabling rapid, fine-grained labeling with reduced reliance on predefined categories or extensive expert input.

Abstract

Machine learning for remote sensing imaging relies on up-to-date and accurate labels for model training and testing. Labelling remote sensing imagery is time and cost intensive, requiring expert analysis. Previous labelling tools rely on pre-labelled data for training in order to label new unseen data. In this work, we define an unsupervised pipeline for finding and labelling geographical areas of similar context and content within Sentinel-2 satellite imagery. Our approach removes limitations of previous methods by utilising segmentation with convolutional and graph neural networks to encode a more robust feature space for image comparison. Unlike previous approaches we segment the image into homogeneous regions of pixels that are grouped based on colour and spatial similarity. Graph neural networks are used to aggregate information about the surrounding segments enabling the feature representation to encode the local neighbourhood whilst preserving its own local information. This reduces outliers in the labelling tool, allows users to label at a granular level, and allows a rotationally invariant semantic relationship at the image level to be formed within the encoding space. Our pipeline achieves high contextual consistency, with similarity scores of SSIM = 0.96 and SAM = 0.21 under context-aware evaluation, demonstrating robust organisation of the feature space for interactive labelling.

Paper Structure

This paper contains 32 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: This diagram illustrates the data flow within the pipeline. Blue boxes represent the contracting layers of the U-Net, red boxes the expansive layers, and yellow the predictive layer. Blue lines indicate skip connections. Red lines show how data is used in the loss function, where the third GNN layer reshapes its input to enable comparison with the targets.
  • Figure 2: UMAP dimension reduction to 2D on the graph matching output of our entire pipeline. At this level, each point represents one chip. The user interactively highlights a resizeable region which can be dragged across the manifold representation. Chip images represented by the 2D points within the highlight are displayed in the pane below. Training did not use any labelled data. (Video frame edited to save space in the paper).
  • Figure 3: Example of the U-map embedding dimensionality reduction of high dimensional feature space, $X$, output from the final graph matching stage of our entire pipeline. Training was unsupervised. The images selected by the user highlights are shown in the display pane with the labels A-D explained in Section \ref{['sec:clusterexplore']}.
  • Figure 4: Example of the UMAP embedding space from patel2023manifold, showing results influenced by strong texture similarities due to edge alignment, and a comparison with our approach, which introduces rotational invariance to remove this strong alignment.
  • Figure 5: Example (from video) of projecting four images and exploring their segmentations to label as urban or vegetation. The left selection, (A), shows the largely vegetation segments in the first row of images and (B) largely urban development in the same four images, but demonstrated in the second row. (C) is a selection in the manifold of segmentations which related to golf courses and are present on different images as displayed on the third row (the golf course example is in the video).
  • ...and 1 more figures