Table of Contents
Fetching ...

Complementary Random Masking for RGB-Thermal Semantic Segmentation

Ukcheol Shin, Kyunghyun Lee, In So Kweon, Jean Oh

TL;DR

A complementary random masking strategy of RGB-T images and self-distillation loss between clean and masked input modalities are proposed that prevents over-reliance on a single modality and improves the accuracy and robustness of the neural network.

Abstract

RGB-thermal semantic segmentation is one potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions. However, the previous studies mostly focus on designing a multi-modal fusion module without consideration of the nature of multi-modality inputs. Therefore, the networks easily become over-reliant on a single modality, making it difficult to learn complementary and meaningful representations for each modality. This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities. The proposed masking strategy prevents over-reliance on a single modality. It also improves the accuracy and robustness of the neural network by forcing the network to segment and classify objects even when one modality is partially available. Also, the proposed self-distillation loss encourages the network to extract complementary and meaningful representations from a single modality or complementary masked modalities. Based on the proposed method, we achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks. Our source code is available at https://github.com/UkcheolShin/CRM_RGBTSeg.

Complementary Random Masking for RGB-Thermal Semantic Segmentation

TL;DR

A complementary random masking strategy of RGB-T images and self-distillation loss between clean and masked input modalities are proposed that prevents over-reliance on a single modality and improves the accuracy and robustness of the neural network.

Abstract

RGB-thermal semantic segmentation is one potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions. However, the previous studies mostly focus on designing a multi-modal fusion module without consideration of the nature of multi-modality inputs. Therefore, the networks easily become over-reliant on a single modality, making it difficult to learn complementary and meaningful representations for each modality. This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities. The proposed masking strategy prevents over-reliance on a single modality. It also improves the accuracy and robustness of the neural network by forcing the network to segment and classify objects even when one modality is partially available. Also, the proposed self-distillation loss encourages the network to extract complementary and meaningful representations from a single modality or complementary masked modalities. Based on the proposed method, we achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks. Our source code is available at https://github.com/UkcheolShin/CRM_RGBTSeg.
Paper Structure (24 sections, 6 equations, 8 figures, 4 tables)

This paper contains 24 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Complementary random masking for RGB-thermal semantic segmentation. Our proposed method aims to learn meaningful and complementary representations from RGB and thermal images by using complementary masking of RGB-T inputs and ensuring consistency between augmented and original inputs. The proposed method leads to robust and reliable segmentation results in day-light, low-light, and modality-dropped scenarios.
  • Figure 2: Input modality dependency comparison of RGB-T semantic segmentation networks. Common multi-modal fusion approaches often result in a sub-optimal solution, where the neural network becomes over-reliant on a single modality, as shown in (e) and (f). On the other hand, our proposed method prevents the over-reliance issue (i.e., (h) and (i)).
  • Figure 3: Overall pipeline of complementary masking and self-distillation for RGB-thermal semantic segmentation. Our proposed training framework consists of complementary random masking and self-distillation loss. We randomly mask the patchified RGB-thermal pair in a complementary manner that guarantees at least one modality is valid. After that, the network estimates each prediction results from clean and masked RGB-thermal pairs. We enforce the network to predict the same class prediction results from the clean and masked RGB-thermal pairs. The proposed method resolves the over-reliant problem of RGB-T semantic segmentation networks and encourages the network to extract complementary and meaningful representations for robust and accurate semantic segmentation performance from RGB-T images.
  • Figure 4: Qualitative comparison for semantic segmentation of RGB-T images on MFNet ha2017mfnet, PST900 shivakumar2019pst900, and KP hwang2015multispectral datasets. The first two rows are qualitative comparisons of MFNet dataset, the next two rows are PST 900 dataset results, and the remaining rows are KP dataset results. The proposed method shows reliable and accurate segmentation results across all datasets, including day-light, low-light, noisy images, and harsh cave conditions. Further results can be found in the supplementary video.
  • Figure 5: Illustration of various complementary masking strategies. Square masking randomly masks a square area with half the height and width of the image in a random position. Patch masking randomly masks half an image (i.e., 0.5 ratios) with patches of different sizes (e.g., 8, 16, 32, 64).
  • ...and 3 more figures