Table of Contents
Fetching ...

Cross Pseudo Supervision Framework for Sparsely Labelled Geospatial Images

Yash Dixit, Naman Srivastava, Joel D Joy, Rohan Olikara, Swarup E, Rakshit Ramesh

TL;DR

The paper tackles sparse and noisy labeling in Land Use Land Cover (LULC) segmentation from high-resolution Indian satellite imagery. It introduces a Cross Pseudo Supervision (CPS) framework using two DeepLabv3+ models with EfficientNet backbones, combining a supervised loss (including Hausdorff erosion and weighted cross-entropy) with a CPS loss and a ramp-up schedule for semi-supervised learning. Experiments on Cartosat-3 data demonstrate that CPS achieves higher recall across classes than single-model baselines, indicating improved robustness to label sparsity in diverse Indian terrains. The work advances practical, scalable LULC mapping for urban planning and suggests future directions such as dynamic loss weighting and data conditioning (cloud removal, atmospheric correction) to further boost performance.

Abstract

Land Use Land Cover (LULC) mapping is a vital tool for urban and resource planning, playing a key role in the development of innovative and sustainable cities. This study introduces a semi-supervised segmentation model for LULC prediction using high-resolution satellite images with a vast diversity of data distributions in different areas of India. Our approach ensures a robust generalization across different types of buildings, roads, trees, and water bodies within these distinct areas. We propose a modified Cross Pseudo Supervision framework to train image segmentation models on sparsely labelled data. The proposed framework addresses the limitations of the famous 'Cross Pseudo Supervision' technique for semi-supervised learning, specifically tackling the challenges of training segmentation models on noisy satellite image data with sparse and inaccurate labels. This comprehensive approach significantly enhances the accuracy and utility of LULC mapping, providing valuable insights for urban and resource planning applications.

Cross Pseudo Supervision Framework for Sparsely Labelled Geospatial Images

TL;DR

The paper tackles sparse and noisy labeling in Land Use Land Cover (LULC) segmentation from high-resolution Indian satellite imagery. It introduces a Cross Pseudo Supervision (CPS) framework using two DeepLabv3+ models with EfficientNet backbones, combining a supervised loss (including Hausdorff erosion and weighted cross-entropy) with a CPS loss and a ramp-up schedule for semi-supervised learning. Experiments on Cartosat-3 data demonstrate that CPS achieves higher recall across classes than single-model baselines, indicating improved robustness to label sparsity in diverse Indian terrains. The work advances practical, scalable LULC mapping for urban planning and suggests future directions such as dynamic loss weighting and data conditioning (cloud removal, atmospheric correction) to further boost performance.

Abstract

Land Use Land Cover (LULC) mapping is a vital tool for urban and resource planning, playing a key role in the development of innovative and sustainable cities. This study introduces a semi-supervised segmentation model for LULC prediction using high-resolution satellite images with a vast diversity of data distributions in different areas of India. Our approach ensures a robust generalization across different types of buildings, roads, trees, and water bodies within these distinct areas. We propose a modified Cross Pseudo Supervision framework to train image segmentation models on sparsely labelled data. The proposed framework addresses the limitations of the famous 'Cross Pseudo Supervision' technique for semi-supervised learning, specifically tackling the challenges of training segmentation models on noisy satellite image data with sparse and inaccurate labels. This comprehensive approach significantly enhances the accuracy and utility of LULC mapping, providing valuable insights for urban and resource planning applications.
Paper Structure (21 sections, 7 equations, 2 figures, 1 table)

This paper contains 21 sections, 7 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Workflow for LULC Segmentation. (a.): The Merged Training masks are created by combining the binary class masks of the concerned cities which have been generated by using the JOSM vector files. Apart from these classes, the remaining areas are classified as the "Other" class. During training, we focus on a subset of data that is densely populated with classes and extract chips from it. (b.): The Merged Evaluation masks are created by utilizing the output binary masks of roads, water, trees, combined data of buildings (JOSM, Microsoft, and Google), and treating the remaining areas as the "Other" class. During the model evaluation, data is chipped without imposing class-heavy constraints.
  • Figure 2: Cross Pseudo Supervision Architecture. $f(\theta1)$ and $f(\theta2)$ are the 2 distinct Deeplabv3+ models which are being trained. P1 and P2 are the logits that have been generated from the respective models and y1 and y2 are their corresponding one-hot encoded values. $y^*$ are the ground truth models