Leveraging Predicate and Triplet Learning for Scene Graph Generation

Jiankai Li; Yunhong Wang; Xiefan Guo; Ruijie Yang; Weixin Li

Leveraging Predicate and Triplet Learning for Scene Graph Generation

Jiankai Li, Yunhong Wang, Xiefan Guo, Ruijie Yang, Weixin Li

TL;DR

The paper addresses Scene Graph Generation (SGG) under two core challenges: large visual variation within the same predicate and the long-tail distribution of tail predicates. It introduces the Dual-granularity Relation Modeling (DRM) network, which simultaneously learns coarse-grained predicate cues and fine-grained triplet cues, augmented by dual-granularity constraints. A Dual-granularity Knowledge Transfer (DKT) strategy then transfers variation from head predicates/triplets to tail ones to enrich tail patterns and balance training data. Extensive experiments on Visual Genome, Open Images, and GQA demonstrate state-of-the-art performance and strong generalization, validating the effectiveness and practicality of dual-granularity learning for unbiased SGG.

Abstract

Scene Graph Generation (SGG) aims to identify entities and predict the relationship triplets \textit{\textless subject, predicate, object\textgreater } in visual scenes. Given the prevalence of large visual variations of subject-object pairs even in the same predicate, it can be quite challenging to model and refine predicate representations directly across such pairs, which is however a common strategy adopted by most existing SGG methods. We observe that visual variations within the identical triplet are relatively small and certain relation cues are shared in the same type of triplet, which can potentially facilitate the relation learning in SGG. Moreover, for the long-tail problem widely studied in SGG task, it is also crucial to deal with the limited types and quantity of triplets in tail predicates. Accordingly, in this paper, we propose a Dual-granularity Relation Modeling (DRM) network to leverage fine-grained triplet cues besides the coarse-grained predicate ones. DRM utilizes contexts and semantics of predicate and triplet with Dual-granularity Constraints, generating compact and balanced representations from two perspectives to facilitate relation recognition. Furthermore, a Dual-granularity Knowledge Transfer (DKT) strategy is introduced to transfer variation from head predicates/triplets to tail ones, aiming to enrich the pattern diversity of tail classes to alleviate the long-tail problem. Extensive experiments demonstrate the effectiveness of our method, which establishes new state-of-the-art performance on Visual Genome, Open Image, and GQA datasets. Our code is available at \url{https://github.com/jkli1998/DRM}

Leveraging Predicate and Triplet Learning for Scene Graph Generation

TL;DR

Abstract

Paper Structure (21 sections, 9 equations, 9 figures, 12 tables)

This paper contains 21 sections, 9 equations, 9 figures, 12 tables.

Introduction
Related Work
Scene Graph Generation
Unbiased Scene Graph Generation
Method
The DRM Network Backbone
Predicate and Triplet Cue Modeling
Dual-granularity Knowledge Transfer
Experiments
Experimental Settings
Comparison with State-of-the-art Methods
Ablation Study
Visualization Analysis
Conclusion
Additional Implementation Details
...and 6 more sections

Figures (9)

Figure 1: The illustration of large visual variations within the predicate "eating". Identical predicate can appear differently under distinct subject-object pairs, encompassing a different set of visual cues within each manifestation. Identifying discriminative relation cues that are shared across diverse subject-object pairs within the same predicate can be challenging. Yet, they can be easily captured when the scope is narrowed to the identical triplet.
Figure 2: Comparison of different pipelines for relation recognition. Previous methods focus on learning predicate cues shared across various triplets with diverse visual appearance. Our method learns and leverages both triplet cues within the same triplet and predicate cues across triplets, to better handle the visual diversity.
Figure 3: Illustration of the proposed Dual-granularity Relation Modeling (DRM) network. The learning procedure of DRM is composed of two stages. In the first stage, we capture the coarse-grained predicate cues shared across different subject-object pairs and learn the fine-grained triplet cues under specific subject-object pairs. In the second stage, the Dual-granularity Knowledge Transfer (DKT) strategy transfers the variation from head predicates with their associate triplets to the tail. Then DRM exploits the real instances along with synthetic samples from the calibrated tail distribution to fine-tune the relation classifier, which alleviates the long-tail problem in SGG.
Figure 4: The comparison of t-SNE visualization results on predicate and triplet feature distributions within the VG dataset. "MOTIFS, Triplet" and "DRM w/o DKT, Triplet" visualize the same set of samples, where each unique color represents a different type of triplet.
Figure 5: Results in terms of Recall@100 of all predicate classes of Predicate-Only and DRM w/o DKT on the PredCls task. Predicates are sorted according to their frequency.
...and 4 more figures

Leveraging Predicate and Triplet Learning for Scene Graph Generation

TL;DR

Abstract

Leveraging Predicate and Triplet Learning for Scene Graph Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)