Table of Contents
Fetching ...

DiverseNet: Decision Diversified Semi-supervised Semantic Segmentation Networks for Remote Sensing Imagery

Wanli Ma, Oktay Karakus, Paul L. Rosin

TL;DR

The paper tackles the scalability and diversity limitations of semi-supervised semantic segmentation in remote sensing by introducing DiverseHead, a lightweight multi-head approach, and DiverseModel, a cross-model ensemble. DiverseHead employs dynamic freezing and dropout to create diverse head-specific predictions within a single network, using a dual voting scheme to produce robust pseudo labels and a simple loss combining supervised and unsupervised terms. DiverseModel extends this idea to three distinct networks to generate and leverage cross-network pseudo labels, with six pairwise cross-supervision losses and Grad-CAM evidence of complementary attention. Across four remote-sensing datasets, the methods achieve competitive or superior performance while remaining computationally efficient, with DiverseHead offering significant efficiency gains and easy compatibility with existing SSL methods.

Abstract

Semi-supervised learning (SSL) aims to help reduce the cost of the manual labelling process by leveraging a substantial pool of unlabelled data alongside a limited set of labelled data during the training phase. Since pixel-level manual labelling in large-scale remote sensing imagery is expensive and time-consuming, semi-supervised learning has become a widely used solution to deal with this. However, the majority of existing SSL frameworks, especially various teacher-student frameworks, are too bulky to run efficiently on a GPU with limited memory. There is still a lack of lightweight SSL frameworks and efficient perturbation methods to promote the diversity of training samples and enhance the precision of pseudo labels during training. In order to fill this gap, we proposed a simple, lightweight, and efficient SSL architecture named \textit{DiverseHead}, which promotes the utilisation of multiple decision heads instead of multiple whole networks. Another limitation of most existing SSL frameworks is the insufficient diversity of pseudo labels, as they rely on the same network architecture and fail to explore different structures for generating pseudo labels. To solve this issue, we propose \textit{DiverseModel} to explore and analyse different networks in parallel for SSL to increase the diversity of pseudo labels. The two proposed methods, namely \textit{DiverseHead} and \textit{DiverseModel}, both achieve competitive semantic segmentation performance in four widely used remote sensing imagery datasets compared to state-of-the-art semi-supervised learning methods. Meanwhile, the proposed lightweight DiverseHead architecture can be easily applied to various state-of-the-art SSL methods while further improving their performance. The code is available at https://github.com/WANLIMA-CARDIFF/DiverseNet.

DiverseNet: Decision Diversified Semi-supervised Semantic Segmentation Networks for Remote Sensing Imagery

TL;DR

The paper tackles the scalability and diversity limitations of semi-supervised semantic segmentation in remote sensing by introducing DiverseHead, a lightweight multi-head approach, and DiverseModel, a cross-model ensemble. DiverseHead employs dynamic freezing and dropout to create diverse head-specific predictions within a single network, using a dual voting scheme to produce robust pseudo labels and a simple loss combining supervised and unsupervised terms. DiverseModel extends this idea to three distinct networks to generate and leverage cross-network pseudo labels, with six pairwise cross-supervision losses and Grad-CAM evidence of complementary attention. Across four remote-sensing datasets, the methods achieve competitive or superior performance while remaining computationally efficient, with DiverseHead offering significant efficiency gains and easy compatibility with existing SSL methods.

Abstract

Semi-supervised learning (SSL) aims to help reduce the cost of the manual labelling process by leveraging a substantial pool of unlabelled data alongside a limited set of labelled data during the training phase. Since pixel-level manual labelling in large-scale remote sensing imagery is expensive and time-consuming, semi-supervised learning has become a widely used solution to deal with this. However, the majority of existing SSL frameworks, especially various teacher-student frameworks, are too bulky to run efficiently on a GPU with limited memory. There is still a lack of lightweight SSL frameworks and efficient perturbation methods to promote the diversity of training samples and enhance the precision of pseudo labels during training. In order to fill this gap, we proposed a simple, lightweight, and efficient SSL architecture named \textit{DiverseHead}, which promotes the utilisation of multiple decision heads instead of multiple whole networks. Another limitation of most existing SSL frameworks is the insufficient diversity of pseudo labels, as they rely on the same network architecture and fail to explore different structures for generating pseudo labels. To solve this issue, we propose \textit{DiverseModel} to explore and analyse different networks in parallel for SSL to increase the diversity of pseudo labels. The two proposed methods, namely \textit{DiverseHead} and \textit{DiverseModel}, both achieve competitive semantic segmentation performance in four widely used remote sensing imagery datasets compared to state-of-the-art semi-supervised learning methods. Meanwhile, the proposed lightweight DiverseHead architecture can be easily applied to various state-of-the-art SSL methods while further improving their performance. The code is available at https://github.com/WANLIMA-CARDIFF/DiverseNet.
Paper Structure (11 sections, 7 equations, 9 figures, 7 tables, 3 algorithms)

This paper contains 11 sections, 7 equations, 9 figures, 7 tables, 3 algorithms.

Figures (9)

  • Figure 1: Two kinds of pseudo label generation and usage methods for SSL based on (a) DiverseHead with multiple heads and (b) DiverseModel with multiple models. '$\longrightarrow$‘ means data stream, '$\dashrightarrow$‘ means loss supervision. The 'dynamic freezing' and 'dropout' are used as perturbation methods in the DiverseHead framework.
  • Figure 2: DiverseHead: an online semi-supervised learning approach. This figure applies the dynamic freezing strategy: the freezers (Dynamic Selector in the figure) randomly select a certain number of heads to freeze the parameter of heads (not updated by backpropagation). Additionally, during every iteration, all heads undergo supervision through a supervised loss, yet each head is randomly chosen to be updated by an unsupervised loss.
  • Figure 3: The Proposed Voting Module: a voting mechanism for the pseudo label creation. In the unsupervised part, the voting module combines the mean output of multiple heads (mean voting) and individual pseudo labels (max voting) to generate more efficient pseudo labels. Argmax returns the indices of the maximum values of the prediction along the class dimension. The dashed arrow serves as an illustration of a pixel voting for its classification in a segmentation map.
  • Figure 4: The proposed two perturbation methods: (A) Dynamic Freezing and (B) Dropout. Dynamic Freezing was designed to enhance the parameter diversity across multiple heads. During each training iteration, a specific subset of heads is randomly selected (The Dynamic Selector in the figure is used for this purpose), and their parameters are frozen, meaning they are not updated by minimising the loss in that iteration. These parameters are unfrozen before the next iteration begins. Instead, for Dropout, each channel of features passed through each head is independently zeroed out with a dropout rate $p$ during each forward pass.
  • Figure 5: DiverseModel: an online semi-supervised learning approach.
  • ...and 4 more figures