Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution Can Improve Robust Scene Graph Generation

Changsheng Lv; Zijian Fu; Mengshi Qi

Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution Can Improve Robust Scene Graph Generation

Changsheng Lv, Zijian Fu, Mengshi Qi

TL;DR

Robo-SGG addresses robustness in scene graph generation under image corruptions by leveraging global layout information. It introduces two modules: Layout-Oriented Normalization and Restitution (NRM) to stabilize feature maps via Instance Normalization and layout-aware restitution, and Layout-Embedded Encoder (LEE) to adaptively fuse spatial and visual cues through gating. The approach is plug-and-play and improves robustness across multiple baselines, achieving state-of-the-art results on corruption benchmarks VG-C and GQA-C with favorable efficiency. This work offers a practical solution to domain shift in SGG, emphasizing structural feature stability over purely appearance-based cues.

Abstract

In this paper, we propose Robo-SGG, a plug-and-play module for robust scene graph generation (SGG). Unlike standard SGG, the robust scene graph generation aims to perform inference on a diverse range of corrupted images, with the core challenge being the domain shift between the clean and corrupted images. Existing SGG methods suffer from degraded performance due to shifted visual features (e.g., corruption interference or occlusions). To obtain robust visual features, we leverage layout information, representing the global structure of an image, which is robust to domain shift, to enhance the robustness of SGG methods under corruption. Specifically, we employ Instance Normalization (IN) to alleviate the domain-specific variations and recover the robust structural features (i.e., the positional and semantic relationships among objects) by the proposed Layout-Oriented Restitution. Furthermore, under corrupted images, we introduce a Layout-Embedded Encoder (LEE) that adaptively fuses layout and visual features via a gating mechanism, enhancing the robustness of positional and semantic representations for objects and predicates. Note that our proposed Robo-SGG module is designed as a plug-and-play component, which can be easily integrated into any baseline SGG model. Extensive experiments demonstrate that by integrating the state-of-the-art method into our proposed Robo-SGG, we achieve relative improvements of 6.3%, 11.1%, and 8.0% in mR@50 for PredCls, SGCls, and SGDet tasks on the VG-C benchmark, respectively, and achieve new state-of-the-art performance in the corruption scene graph generation benchmark (VG-C and GQA-C). We will release our source code and model.

Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution Can Improve Robust Scene Graph Generation

TL;DR

Abstract

Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution Can Improve Robust Scene Graph Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)