Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation
KuanChao Chu, Satoshi Yamazaki, Hideki Nakayama
TL;DR
This work tackles the data sparsity and semantic ambiguity of informative relational triplets in Scene Graph Generation by introducing two complementary modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA creates diverse artificial triplets in the feature space and uses a biased MP-sampler with a pre-trained generator to regularize relation prediction, while Soft Transfer assigns non-binary, reliability-weighted labels to reassigned predicates to preserve head-class performance. Together, they improve both Recall and mean Recall on Visual Genome across Motif and RelDN models, outperforming IETrans and mitigating typical head-tail trade-offs. The approach is model-agnostic, data-efficient, and validated through extensive ablations and analyses, highlighting practical gains for unbiased SGG and potential applicability to related compositional learning problems.
Abstract
This work focuses on training dataset enhancement of informative relational triplets for Scene Graph Generation (SGG). Due to the lack of effective supervision, the current SGG model predictions perform poorly for informative relational triplets with inadequate training samples. Therefore, we propose two novel training dataset enhancement modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer. FSTA leverages a feature generator trained to generate representations of an object in relational triplets. The biased prediction based sampling in FSTA efficiently augments artificial triplets focusing on the challenging ones. In addition, we introduce Soft Transfer, which assigns soft predicate labels to general relational triplets to make more supervisions for informative predicate classes effectively. Experimental results show that integrating FSTA and Soft Transfer achieve high levels of both Recall and mean Recall in Visual Genome dataset. The mean of Recall and mean Recall is the highest among all the existing model-agnostic methods.
