Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

Jaehyeong Jeon; Kibum Kim; Kanghoon Yoon; Chanyoung Park

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

Jaehyeong Jeon, Kibum Kim, Kanghoon Yoon, Chanyoung Park

TL;DR

This work addresses the bias in scene graph generation arising from annotating each subject–object pair with a single predicate, which overlooks the semantic diversity of predicates and is exacerbated by long-tail distributions. The authors propose Semantic Diversity-aware Prototype-based Learning (DPL), a model-agnostic framework that learns a prototype $c_i$ for each predicate and aligns relation features $z$ to these prototypes via $||z-c_i||_2$, while explicitly modeling semantic diversity through Gaussian sampling around prototypes and a matching loss with radius $R$. An orthogonal loss enforces independence among predicate prototypes, and unbiased inference is achieved by normalizing distances with predicate-specific diversity scales $\sigma_i$, enabling better handling of head–tail bias during prediction. Extensive experiments on VG and GQA show that DPL improves baseline SGG models, surpasses existing unbiased methods, and provides interpretable visualizations of predicate semantics, confirming the effectiveness and generality of the approach.

Abstract

The scene graph generation (SGG) task involves detecting objects within an image and predicting predicates that represent the relationships between the objects. However, in SGG benchmark datasets, each subject-object pair is annotated with a single predicate even though a single predicate may exhibit diverse semantics (i.e., semantic diversity), existing SGG models are trained to predict the one and only predicate for each pair. This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate, thus leading to biased predictions. In this paper, we propose a novel model-agnostic Semantic Diversity-aware Prototype-based Learning (DPL) framework that enables unbiased predictions based on the understanding of the semantic diversity of predicates. Specifically, DPL learns the regions in the semantic space covered by each predicate to distinguish among the various different semantics that a single predicate can represent. Extensive experiments demonstrate that our proposed model-agnostic DPL framework brings significant performance improvement on existing SGG models, and also effectively understands the semantic diversity of predicates.

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

TL;DR

for each predicate and aligns relation features

to these prototypes via

, while explicitly modeling semantic diversity through Gaussian sampling around prototypes and a matching loss with radius

. An orthogonal loss enforces independence among predicate prototypes, and unbiased inference is achieved by normalizing distances with predicate-specific diversity scales

, enabling better handling of head–tail bias during prediction. Extensive experiments on VG and GQA show that DPL improves baseline SGG models, surpasses existing unbiased methods, and provides interpretable visualizations of predicate semantics, confirming the effectiveness and generality of the approach.

Abstract

Paper Structure (40 sections, 9 equations, 6 figures, 9 tables)

This paper contains 40 sections, 9 equations, 6 figures, 9 tables.

Introduction
Related work
Scene Graph Generation
Unbiased Scene Graph Generation
Method
Preliminary
Proposal Generation.
Object Class Prediction.
Predicate Class Prediction.
Prototype-based Biased Training
Semantic Diversity Learning
Sample Matching Loss.
Orthogonal Loss.
Unbiased Inference using Normalization
Experiment
...and 25 more sections

Figures (6)

Figure 1: Examples representing the semantic diversity of (a)on, and (b)hanging from. (c) Left: Due to the long-tail distribution of the dataset, many relation features are located near the prototype of the head class on. Right: The result of learning the regions representable by each predicate after our proposed DPL is adopted, where different semantics near the prototype on can be distinguished.
Figure 2: The overall pipeline of Semantic Diversity aware Prototype-based Learning (DPL) framework. First, DPL obtains the relation feature from each subject-object pair, and creates a prototype corresponding to each predicate. Then, DPL conducts prototype-based biased training to encourage each relation feature to approach its corresponding prototype (Sec \ref{['subsec:prototype']}). A yellow dot corresponds to a relation feature labeled as on, while the green and red dots correspond to growing on and attached to, respectively. At the same time, DPL conducts semantic diversity learning to capture regions that can be expressed by predicates (Sec \ref{['subsec:semantic_diversity']}). These regions can be understood as the variances of prototypes. Lastly, we compute the normalized distance to perform unbiased inference (Sec. \ref{['subsec:unbiased_prediction']}).
Figure 2: The predicted distribution over the predicate classes of Motifs, Motifs+re-weighting and Motifs+DPL. The purple-highlighted predicate below each image represents the ground truth predicate of the triplet.
Figure 3: Conceptual illustration of applying the orthogonal loss. (a) Without the orthogonal loss, there is a risk of an unexpected overlap between unrelated predicates due to the symmetry of the normal distribution. (b) However, applying the orthogonal loss can prevent such phenomena.
Figure 4: The PCA visualization of (a) prototypes and relation features of subject-object pairs, and (b) prototypes and samples of each prototype. To better illustrate the trend of the learned variances, we equally scale up the variances to generate samples. The representations are obtained from DPL with Motifs backbone.
...and 1 more figures

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

TL;DR

Abstract

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)