Table of Contents
Fetching ...

Inter-object Discriminative Graph Modeling for Indoor Scene Recognition

Chuanxin Song, Hanbo Wu, Xin Ma

TL;DR

This paper tackles indoor scene recognition by addressing how discriminative object knowledge can be leveraged to distinguish visually similar environments. It introduces Inter-Object Discriminative Prototype (IODP), a probabilistically derived prior that captures discriminative correlations among objects, and a Discriminative Graph Network (DGN) that encodes these relationships as edges in a pixel-level graph over scene features. By applying a single graph convolutional layer with a backbone-aware auxiliary loss, the approach refines pixel representations to emphasize discriminative object regions, achieving improvements over baselines and competitive state-of-the-art results on MIT-67, SUN397, Places_7, and Places_14. The method demonstrates strong generalization and efficiency benefits, while acknowledging limitations tied to segmentation accuracy and object coverage, with future work aiming to remove reliance on explicit segmentation.

Abstract

Variable scene layouts and coexisting objects across scenes make indoor scene recognition still a challenging task. Leveraging object information within scenes to enhance the distinguishability of feature representations has emerged as a key approach in this domain. Currently, most object-assisted methods use a separate branch to process object information, combining object and scene features heuristically. However, few of them pay attention to interpretably handle the hidden discriminative knowledge within object information. In this paper, we propose to leverage discriminative object knowledge to enhance scene feature representations. Initially, we capture the object-scene discriminative relationships from a probabilistic perspective, which are transformed into an Inter-Object Discriminative Prototype (IODP). Given the abundant prior knowledge from IODP, we subsequently construct a Discriminative Graph Network (DGN), in which pixel-level scene features are defined as nodes and the discriminative relationships between node features are encoded as edges. DGN aims to incorporate inter-object discriminative knowledge into the image representation through graph convolution and mapping operations (GCN). With the proposed IODP and DGN, we obtain state-of-the-art results on several widely used scene datasets, demonstrating the effectiveness of the proposed approach.

Inter-object Discriminative Graph Modeling for Indoor Scene Recognition

TL;DR

This paper tackles indoor scene recognition by addressing how discriminative object knowledge can be leveraged to distinguish visually similar environments. It introduces Inter-Object Discriminative Prototype (IODP), a probabilistically derived prior that captures discriminative correlations among objects, and a Discriminative Graph Network (DGN) that encodes these relationships as edges in a pixel-level graph over scene features. By applying a single graph convolutional layer with a backbone-aware auxiliary loss, the approach refines pixel representations to emphasize discriminative object regions, achieving improvements over baselines and competitive state-of-the-art results on MIT-67, SUN397, Places_7, and Places_14. The method demonstrates strong generalization and efficiency benefits, while acknowledging limitations tied to segmentation accuracy and object coverage, with future work aiming to remove reliance on explicit segmentation.

Abstract

Variable scene layouts and coexisting objects across scenes make indoor scene recognition still a challenging task. Leveraging object information within scenes to enhance the distinguishability of feature representations has emerged as a key approach in this domain. Currently, most object-assisted methods use a separate branch to process object information, combining object and scene features heuristically. However, few of them pay attention to interpretably handle the hidden discriminative knowledge within object information. In this paper, we propose to leverage discriminative object knowledge to enhance scene feature representations. Initially, we capture the object-scene discriminative relationships from a probabilistic perspective, which are transformed into an Inter-Object Discriminative Prototype (IODP). Given the abundant prior knowledge from IODP, we subsequently construct a Discriminative Graph Network (DGN), in which pixel-level scene features are defined as nodes and the discriminative relationships between node features are encoded as edges. DGN aims to incorporate inter-object discriminative knowledge into the image representation through graph convolution and mapping operations (GCN). With the proposed IODP and DGN, we obtain state-of-the-art results on several widely used scene datasets, demonstrating the effectiveness of the proposed approach.
Paper Structure (28 sections, 8 equations, 10 figures, 7 tables, 2 algorithms)

This paper contains 28 sections, 8 equations, 10 figures, 7 tables, 2 algorithms.

Figures (10)

  • Figure 1: Example images from the MIT-67 dataset. The images in four columns are from four different scene categories ("Library," "Shoeshop," "Computer room," and "Dining room"). It can be seen that the spatial layout within the same category is variable, and the spatial layout within different scenes may be similar (e.g., "Library" and "Shoeshop"); and there may be coexisting objects within different scenes (e.g., "Chair" in scene "Computer room" and "Dining room").
  • Figure 2: Illustrations of existing and proposed methods. (a) feature response-based method yuan2019acmr2chen2020scener3zhao2018volcanor5lin2022scener4, (b) separate object branch-based method (without knowledge) lopez2020semanticr9song2023Srrmr10sceneessencer11sitaula2021contentr40 and (c) separate object branch-based method (with knowledge) cheng2018scener7pereira2021deepr13zhou2021bormr14choe2021indoorr16, and (d) the proposed method.
  • Figure 3: Overall workflow of the proposed approach for scene recognition. We first statistics and analyze all the training data to get the Inter-Object Discriminative Prototype (IODP). Then, we build a Discriminative Graph Network (DGN) upon IODP to integrate discriminative object knowledge into image representations.
  • Figure 4: A description of the process of constructing the Inter-Object Discriminative Prototype (IODP). The whole process is based on the training data from scene datasets. Different colored blocks represent different scene categories, where $S_c$ denotes the $c_{th}$ scene category. The resulting IODP appears as a matrix with $L \times L$ dimensions. $L$ denotes the number of segmentable objects by the semantic segmentation technique. $\Omega_{i,j}$
  • Figure 5: Posterior probabilities Visualization for different scene categories when given several object pairs in MIT-67 and Places_14 datasets.
  • ...and 5 more figures