Table of Contents
Fetching ...

Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation

KuanChao Chu, Satoshi Yamazaki, Hideki Nakayama

TL;DR

This work tackles the data sparsity and semantic ambiguity of informative relational triplets in Scene Graph Generation by introducing two complementary modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA creates diverse artificial triplets in the feature space and uses a biased MP-sampler with a pre-trained generator to regularize relation prediction, while Soft Transfer assigns non-binary, reliability-weighted labels to reassigned predicates to preserve head-class performance. Together, they improve both Recall and mean Recall on Visual Genome across Motif and RelDN models, outperforming IETrans and mitigating typical head-tail trade-offs. The approach is model-agnostic, data-efficient, and validated through extensive ablations and analyses, highlighting practical gains for unbiased SGG and potential applicability to related compositional learning problems.

Abstract

This work focuses on training dataset enhancement of informative relational triplets for Scene Graph Generation (SGG). Due to the lack of effective supervision, the current SGG model predictions perform poorly for informative relational triplets with inadequate training samples. Therefore, we propose two novel training dataset enhancement modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA leverages a feature generator trained to generate representations of an object in relational triplets. The biased prediction based sampling in FSTA efficiently augments artificial triplets focusing on the challenging ones. In addition, we introduce Soft Transfer, which assigns soft predicate labels to general relational triplets to make more supervisions for informative predicate classes effectively. Experimental results show that integrating FSTA and Soft Transfer achieve high levels of both Recall and mean Recall in Visual Genome dataset. The mean of Recall and mean Recall is the highest among all the existing model-agnostic methods.

Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation

TL;DR

This work tackles the data sparsity and semantic ambiguity of informative relational triplets in Scene Graph Generation by introducing two complementary modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA creates diverse artificial triplets in the feature space and uses a biased MP-sampler with a pre-trained generator to regularize relation prediction, while Soft Transfer assigns non-binary, reliability-weighted labels to reassigned predicates to preserve head-class performance. Together, they improve both Recall and mean Recall on Visual Genome across Motif and RelDN models, outperforming IETrans and mitigating typical head-tail trade-offs. The approach is model-agnostic, data-efficient, and validated through extensive ablations and analyses, highlighting practical gains for unbiased SGG and potential applicability to related compositional learning problems.

Abstract

This work focuses on training dataset enhancement of informative relational triplets for Scene Graph Generation (SGG). Due to the lack of effective supervision, the current SGG model predictions perform poorly for informative relational triplets with inadequate training samples. Therefore, we propose two novel training dataset enhancement modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA leverages a feature generator trained to generate representations of an object in relational triplets. The biased prediction based sampling in FSTA efficiently augments artificial triplets focusing on the challenging ones. In addition, we introduce Soft Transfer, which assigns soft predicate labels to general relational triplets to make more supervisions for informative predicate classes effectively. Experimental results show that integrating FSTA and Soft Transfer achieve high levels of both Recall and mean Recall in Visual Genome dataset. The mean of Recall and mean Recall is the highest among all the existing model-agnostic methods.
Paper Structure (28 sections, 10 equations, 8 figures, 16 tables, 1 algorithm)

This paper contains 28 sections, 10 equations, 8 figures, 16 tables, 1 algorithm.

Figures (8)

  • Figure 1: Accuracy comparison between FSTA, Soft Transfer, Full, and the baseline IETrans on Motif ($1^{st}$ row) and RelDN ($2^{nd}$ row). In the scatter plots (left), a larger dot size and a darker color represent higher F1@100 and AVG@100 scores, respectively. As shown in the bar plots (right), increased scores in the overall metrics (F1@100 and AVG@100) indicate the alleviated performance trade-off in our full method, consisting of two complementary modules.
  • Figure 2: The system overview of our proposed method. The FSTA and Soft Transfer modules are designed to introduce new concepts to enhance the baseline dataset manipulation module, IETrans. Blocks indicated in blue are prepared during the pre-processing stage, whereas the blocks in purple are designated for the unbiased SGG model training stage.
  • Figure 3: Building combinations from batch input proposals. Purple box pairs are excluded for low IoU with ground-truth relations and red box pairs are selected as candidates.
  • Figure 4: The schematic view illustrates the combination of FSTA and SGG models. We visualize only the flow of $\mathcal{T}_{spo'}$ with red dotted lines for readability. The green dotted line indicates the point at which features are collected in the preparation stage.
  • Figure 5: An example training image in VisualGenome.
  • ...and 3 more figures