Table of Contents
Fetching ...

Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments

Meng Yu, Luojie Yang, Xunjie He, Yi Yang, Yufeng Yue

TL;DR

Open-RGBT is presented, a novel open-vocabulary RGB-T semantic segmentation model that achieves superior performance in diverse and challenging real-world scenarios, even in the wild, significantly advancing the field of RGB-T semantic segmentation.

Abstract

Semantic segmentation is a critical technique for effective scene understanding. Traditional RGB-T semantic segmentation models often struggle to generalize across diverse scenarios due to their reliance on pretrained models and predefined categories. Recent advancements in Visual Language Models (VLMs) have facilitated a shift from closed-set to open-vocabulary semantic segmentation methods. However, these models face challenges in dealing with intricate scenes, primarily due to the heterogeneity between RGB and thermal modalities. To address this gap, we present Open-RGBT, a novel open-vocabulary RGB-T semantic segmentation model. Specifically, we obtain instance-level detection proposals by incorporating visual prompts to enhance category understanding. Additionally, we employ the CLIP model to assess image-text similarity, which helps correct semantic consistency and mitigates ambiguities in category identification. Empirical evaluations demonstrate that Open-RGBT achieves superior performance in diverse and challenging real-world scenarios, even in the wild, significantly advancing the field of RGB-T semantic segmentation.

Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments

TL;DR

Open-RGBT is presented, a novel open-vocabulary RGB-T semantic segmentation model that achieves superior performance in diverse and challenging real-world scenarios, even in the wild, significantly advancing the field of RGB-T semantic segmentation.

Abstract

Semantic segmentation is a critical technique for effective scene understanding. Traditional RGB-T semantic segmentation models often struggle to generalize across diverse scenarios due to their reliance on pretrained models and predefined categories. Recent advancements in Visual Language Models (VLMs) have facilitated a shift from closed-set to open-vocabulary semantic segmentation methods. However, these models face challenges in dealing with intricate scenes, primarily due to the heterogeneity between RGB and thermal modalities. To address this gap, we present Open-RGBT, a novel open-vocabulary RGB-T semantic segmentation model. Specifically, we obtain instance-level detection proposals by incorporating visual prompts to enhance category understanding. Additionally, we employ the CLIP model to assess image-text similarity, which helps correct semantic consistency and mitigates ambiguities in category identification. Empirical evaluations demonstrate that Open-RGBT achieves superior performance in diverse and challenging real-world scenarios, even in the wild, significantly advancing the field of RGB-T semantic segmentation.

Paper Structure

This paper contains 22 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: We introduce Open-RGBT, a novel open-vocabulary RGB-T semantic segmentation model, which facilitates zero-shot semantic segmentation across various open-world scenarios.
  • Figure 2: The overall framework of Open-RGBT consists of two stages: RGB-T Open-vocabulary Object Detection and Scene Semantic Understanding.
  • Figure 3: Semantic Consistency Correction Module.
  • Figure 4: Our experimental platform and constructed MSVID dataset.
  • Figure 5: The sample qualitative results on multiple datasets. Please zoom in for best view.
  • ...and 1 more figures