Leveraging Convolutional and Graph Networks for an Unsupervised Remote Sensing Labelling Tool
Tulsi Patel, Mark W. Jones, Thomas Redfern
TL;DR
The paper addresses the high cost of labeling remote sensing imagery by proposing an unsupervised, context-aware labelling pipeline that fuses CNN-based feature extraction with graph neural networks to encode spatial neighbourhoods. By segmenting images with SLIC, extracting rich activation maps via a U-Net, and propagating contextual information through a GNN, the method yields a robust, rotation-invariant embedding that is visualizable with UMAP for interactive labeling. The approach is validated through feature- and context-aware evaluations, U-Net benchmarking, and a tool-focused assessment, showing cohesive embeddings, improved labeling granularity, and practical interactive performance. This framework has significant potential to streamline large-scale RS dataset generation, enabling rapid, fine-grained labeling with reduced reliance on predefined categories or extensive expert input.
Abstract
Machine learning for remote sensing imaging relies on up-to-date and accurate labels for model training and testing. Labelling remote sensing imagery is time and cost intensive, requiring expert analysis. Previous labelling tools rely on pre-labelled data for training in order to label new unseen data. In this work, we define an unsupervised pipeline for finding and labelling geographical areas of similar context and content within Sentinel-2 satellite imagery. Our approach removes limitations of previous methods by utilising segmentation with convolutional and graph neural networks to encode a more robust feature space for image comparison. Unlike previous approaches we segment the image into homogeneous regions of pixels that are grouped based on colour and spatial similarity. Graph neural networks are used to aggregate information about the surrounding segments enabling the feature representation to encode the local neighbourhood whilst preserving its own local information. This reduces outliers in the labelling tool, allows users to label at a granular level, and allows a rotationally invariant semantic relationship at the image level to be formed within the encoding space. Our pipeline achieves high contextual consistency, with similarity scores of SSIM = 0.96 and SAM = 0.21 under context-aware evaluation, demonstrating robust organisation of the feature space for interactive labelling.
