Inter-object Discriminative Graph Modeling for Indoor Scene Recognition
Chuanxin Song, Hanbo Wu, Xin Ma
TL;DR
This paper tackles indoor scene recognition by addressing how discriminative object knowledge can be leveraged to distinguish visually similar environments. It introduces Inter-Object Discriminative Prototype (IODP), a probabilistically derived prior that captures discriminative correlations among objects, and a Discriminative Graph Network (DGN) that encodes these relationships as edges in a pixel-level graph over scene features. By applying a single graph convolutional layer with a backbone-aware auxiliary loss, the approach refines pixel representations to emphasize discriminative object regions, achieving improvements over baselines and competitive state-of-the-art results on MIT-67, SUN397, Places_7, and Places_14. The method demonstrates strong generalization and efficiency benefits, while acknowledging limitations tied to segmentation accuracy and object coverage, with future work aiming to remove reliance on explicit segmentation.
Abstract
Variable scene layouts and coexisting objects across scenes make indoor scene recognition still a challenging task. Leveraging object information within scenes to enhance the distinguishability of feature representations has emerged as a key approach in this domain. Currently, most object-assisted methods use a separate branch to process object information, combining object and scene features heuristically. However, few of them pay attention to interpretably handle the hidden discriminative knowledge within object information. In this paper, we propose to leverage discriminative object knowledge to enhance scene feature representations. Initially, we capture the object-scene discriminative relationships from a probabilistic perspective, which are transformed into an Inter-Object Discriminative Prototype (IODP). Given the abundant prior knowledge from IODP, we subsequently construct a Discriminative Graph Network (DGN), in which pixel-level scene features are defined as nodes and the discriminative relationships between node features are encoded as edges. DGN aims to incorporate inter-object discriminative knowledge into the image representation through graph convolution and mapping operations (GCN). With the proposed IODP and DGN, we obtain state-of-the-art results on several widely used scene datasets, demonstrating the effectiveness of the proposed approach.
