Table of Contents
Fetching ...

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

Ce Zhang, Simon Stepputtis, Joseph Campbell, Katia Sycara, Yaqi Xie

TL;DR

This work proposes a novel SGG benchmark containing procedurally generated weather corruptions and other trans-formations over the Visual Genome dataset and in-troduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting.

Abstract

Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non-uniform perturbations like sun glare or water drops. In this work, we propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. Further, we introduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting. At its core, HiKER-SGG utilizes a hierarchical knowledge graph in order to refine its predictions from coarse initial estimates to detailed predictions. In our extensive experiments, we show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks. Code is available at https://github.com/zhangce01/HiKER-SGG.

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

TL;DR

This work proposes a novel SGG benchmark containing procedurally generated weather corruptions and other trans-formations over the Visual Genome dataset and in-troduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting.

Abstract

Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non-uniform perturbations like sun glare or water drops. In this work, we propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. Further, we introduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting. At its core, HiKER-SGG utilizes a hierarchical knowledge graph in order to refine its predictions from coarse initial estimates to detailed predictions. In our extensive experiments, we show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks. Code is available at https://github.com/zhangce01/HiKER-SGG.
Paper Structure (25 sections, 20 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 20 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: We introduce a novel task: robust SGG in the presence of real-world corruptions. Consider an image of a cat obscured by sun glare as an example, where conventional methods often struggle. Our HiKER-SGG leverages hierarchical knowledge to first infer the broader category of an object, for example, $\mathtt{animal}$, before continuing to a more granular identification of an object constrained to various animals. By utilizing such an approach, we simplify the process to correctly identify it as a $\mathtt{cat}$.
  • Figure 2: HiKER-SGG overview. Hierarchical knowledge graphs are constructed from an external knowledge base. Given an image, we first initialize the scene graph using an off-the-shelf detector, Faster-RCNN ren2015faster. We then create bridging connections between the hierarchical knowledge graph and the initial scene graph and perform message passing for hierarchical graph reasoning. Finally, we design a hierarchical inference process to guide the model in making step-by-step predictions explicitly.
  • Figure 3: Qualitative comparisons on the PredCls task. The visualized predicted predicates are picked from the top 50 predicted triplets. Here, red dashed lines denote undetected predicates, solid red lines denote incorrect predictions, and solid green lines indicate correct predictions. For an easier comparison, predicates correctly predicted by our method but incorrectly by GB-Net are highlighted in dark green.
  • Figure 4: Hyperparameter analysis for $\alpha$ in Equation (\ref{['eq:update']}).
  • Figure 5: Training time and parameter count of HiKER-SGG compared with other methods.
  • ...and 4 more figures