Table of Contents
Fetching ...

A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene Classification

Preeti Meena, Himanshu Kumar, Sandeep Yadav

TL;DR

This work addresses RGB-D indoor scene classification by producing compact, informative summaries that reflect stationary scene-defining content. It introduces a three-stage framework that merges visuo-spatial features with a probabilistic segmentation, guided by volumetric saliency to select scene-defining objects, including those in the background. A two-stage fusion of geometric and visual cues, plus an object-classifier-based refinement, yields robust saliency maps and high classification-relevant summaries. The approach demonstrates superior quantitative and qualitative performance on multiple RGB-D datasets and shows clear benefits for scene identification tasks in cluttered indoor environments.

Abstract

Image summary, an abridged version of the original visual content, can be used to represent the scene. Thus, tasks such as scene classification, identification, indexing, etc., can be performed efficiently using the unique summary. Saliency is the most commonly used technique for generating the relevant image summary. However, the definition of saliency is subjective in nature and depends upon the application. Existing saliency detection methods using RGB-D data mainly focus on color, texture, and depth features. Consequently, the generated summary contains either foreground objects or non-stationary objects. However, applications such as scene identification require stationary characteristics of the scene, unlike state-of-the-art methods. This paper proposes a novel volumetric saliency-guided framework for indoor scene classification. The results highlight the efficacy of the proposed method.

A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene Classification

TL;DR

This work addresses RGB-D indoor scene classification by producing compact, informative summaries that reflect stationary scene-defining content. It introduces a three-stage framework that merges visuo-spatial features with a probabilistic segmentation, guided by volumetric saliency to select scene-defining objects, including those in the background. A two-stage fusion of geometric and visual cues, plus an object-classifier-based refinement, yields robust saliency maps and high classification-relevant summaries. The approach demonstrates superior quantitative and qualitative performance on multiple RGB-D datasets and shows clear benefits for scene identification tasks in cluttered indoor environments.

Abstract

Image summary, an abridged version of the original visual content, can be used to represent the scene. Thus, tasks such as scene classification, identification, indexing, etc., can be performed efficiently using the unique summary. Saliency is the most commonly used technique for generating the relevant image summary. However, the definition of saliency is subjective in nature and depends upon the application. Existing saliency detection methods using RGB-D data mainly focus on color, texture, and depth features. Consequently, the generated summary contains either foreground objects or non-stationary objects. However, applications such as scene identification require stationary characteristics of the scene, unlike state-of-the-art methods. This paper proposes a novel volumetric saliency-guided framework for indoor scene classification. The results highlight the efficacy of the proposed method.
Paper Structure (18 sections, 6 equations, 6 figures, 7 tables)

This paper contains 18 sections, 6 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Proposed volumetric saliency guided summarization method.
  • Figure 2: Segmentation comparison of modified SLIC using additional spatial features with original SLIC. (a) RGB image, (b)-(c) segmentation results using original SLIC achanta2012slic, and modified SLIC \ref{['eq:direcslic']} with additional spatial features, respectively.
  • Figure 3: A superpixel example and feature vector.
  • Figure 4: Region merging based on (a) depth only, (b) depth & direction only, and (c) multiple parameters as in \ref{['eqn:similarity']}.
  • Figure 5: Saliency map comparison with SOTA methods. Top row: RGB image. Second to Bottom row: lou2020exploiting, cong2019going, fan2020rethinking, zhang2022c, zhang2021bilateral, li2023mutual, and Proposed method respectively.
  • ...and 1 more figures