Table of Contents
Fetching ...

Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNE

Hong Wei, James Xiao, Yichao Zhang, Xia Hong

TL;DR

The paper addresses semi-supervised land-cover segmentation in remote sensing by leveraging a rich pixel-level feature space and an informative 3D t-SNE embedding. It introduces RBF-CCA, a nonlinear extension of canonical correlation analysis that uses labeled samples as radial-basis centres to align the nonlinear t-SNE representation with ground-truth labels, followed by k-means clustering in the canonical space. A novel pixel-based feature set combines a cell patch, LBP, and GLCM descriptors across multiple bands, enriched with optional NDVI and LiDAR-derived bands. Experiments on two remote-sensing datasets show that RBF-CCA substantially outperforms unsupervised baselines and linear/polynomial CCA, achieving high IOU for classes such as buildings, trees, low vegetation, and impervious surfaces with only a small fraction of labeled data. The approach offers a practical, label-efficient solution for accurate semantic segmentation in multispectral LiDAR-enabled remote sensing tasks.

Abstract

Image segmentation is a clustering task whereby each pixel is assigned a cluster label. Remote sensing data usually consists of multiple bands of spectral images in which there exist semantically meaningful land cover subregions, co-registered with other source data such as LIDAR (LIght Detection And Ranging) data, where available. This suggests that, in order to account for spatial correlation between pixels, a feature vector associated with each pixel may be a vectorized tensor representing the multiple bands and a local patch as appropriate. Similarly, multiple types of texture features based on a pixel's local patch would also be beneficial for encoding locally statistical information and spatial variations, without necessarily labelling pixel-wise a large amount of ground truth, then training a supervised model, which is sometimes impractical. In this work, by resorting to label only a small quantity of pixels, a new semi-supervised segmentation approach is proposed. Initially, over all pixels, an image data matrix is created in high dimensional feature space. Then, t-SNE projects the high dimensional data onto 3D embedding. By using radial basis functions as input features, which use the labelled data samples as centres, to pair with the output class labels, a modified canonical correlation analysis algorithm, referred to as RBF-CCA, is introduced which learns the associated projection matrix via the small labelled data set. The associated canonical variables, obtained for the full image, are applied by k-means clustering algorithm. The proposed semi-supervised RBF-CCA algorithm has been implemented on several remotely sensed multispectral images, demonstrating excellent segmentation results.

Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNE

TL;DR

The paper addresses semi-supervised land-cover segmentation in remote sensing by leveraging a rich pixel-level feature space and an informative 3D t-SNE embedding. It introduces RBF-CCA, a nonlinear extension of canonical correlation analysis that uses labeled samples as radial-basis centres to align the nonlinear t-SNE representation with ground-truth labels, followed by k-means clustering in the canonical space. A novel pixel-based feature set combines a cell patch, LBP, and GLCM descriptors across multiple bands, enriched with optional NDVI and LiDAR-derived bands. Experiments on two remote-sensing datasets show that RBF-CCA substantially outperforms unsupervised baselines and linear/polynomial CCA, achieving high IOU for classes such as buildings, trees, low vegetation, and impervious surfaces with only a small fraction of labeled data. The approach offers a practical, label-efficient solution for accurate semantic segmentation in multispectral LiDAR-enabled remote sensing tasks.

Abstract

Image segmentation is a clustering task whereby each pixel is assigned a cluster label. Remote sensing data usually consists of multiple bands of spectral images in which there exist semantically meaningful land cover subregions, co-registered with other source data such as LIDAR (LIght Detection And Ranging) data, where available. This suggests that, in order to account for spatial correlation between pixels, a feature vector associated with each pixel may be a vectorized tensor representing the multiple bands and a local patch as appropriate. Similarly, multiple types of texture features based on a pixel's local patch would also be beneficial for encoding locally statistical information and spatial variations, without necessarily labelling pixel-wise a large amount of ground truth, then training a supervised model, which is sometimes impractical. In this work, by resorting to label only a small quantity of pixels, a new semi-supervised segmentation approach is proposed. Initially, over all pixels, an image data matrix is created in high dimensional feature space. Then, t-SNE projects the high dimensional data onto 3D embedding. By using radial basis functions as input features, which use the labelled data samples as centres, to pair with the output class labels, a modified canonical correlation analysis algorithm, referred to as RBF-CCA, is introduced which learns the associated projection matrix via the small labelled data set. The associated canonical variables, obtained for the full image, are applied by k-means clustering algorithm. The proposed semi-supervised RBF-CCA algorithm has been implemented on several remotely sensed multispectral images, demonstrating excellent segmentation results.
Paper Structure (14 sections, 33 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 14 sections, 33 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Illustrative diagram of constructing data $\boldsymbol{X}$ for t-SNE then RBF-CCA from a remote sensing image $I^{full}$, where each pixel is mapped onto a feature vector consisting of multiple spatial features (for details, see Section \ref{['S4.2']}).
  • Figure 2: Exemplary 3D view of t-SNE embedding of top, right sub-image (first data set TopoSys GmbH RS, see Section \ref{['S5']}), which shows irregular shaped clusters.
  • Figure 3: Eight multispectral images of TopoSys GmbH (the first data set, Example 1) (a) Red; (b) Green; (c) Blue; (d) Last echo; (e) First echo; (f) LIDAR intensity; (g) Near infrared; (h) NDVI.
  • Figure 4: Segmentation results of TopoSys GmbH (the firts data set, Example 1): (a) RGB view (b) Ground truth; (c) Segmentation; and (d) Confusion matrix (1- Building; 2 -Tree ; 3 - Low vegetation ; and 4- Impervious surface OR Car.)
  • Figure 5: Eight multispectral images of TopoSys GmbH (the second data set, Example 1) (a) Red; (b) Green; (c) Blue; (d) Last echo; (e) First echo; (f) LIDAR intensity; (g) Near infrared; and (h) NDVI
  • ...and 3 more figures