Table of Contents
Fetching ...

Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction

Yansheng Li, Tingzhu Wang, Kang Wu, Linlin Wang, Xin Guo, Wenbin Wang

TL;DR

This work tackles the long-tailed problem in Scene Graph Generation by introducing Sample-Level Bias Prediction (SBP), which predicts per-sample correction biases from the union-region context to refine coarse predictions into fine-grained relations. A Bias-Oriented Generative Adversarial Network (BGAN) is trained to approximate these sample-specific biases, with the corrected logits given by $\hat{\mathbf{z}} = \mathbf{z} + \boldsymbol{b_s}$. The approach constructs a correction bias set $\mathcal{S}$ from a baseline SGG model and jointly optimizes a classic SGG loss $\mathcal{L}_{SGG}$ and the adversarial loss $\mathcal{L}_{BGAN}$ via $\mathcal{L}_{total} = \mathcal{L}_{SGG} + \beta \cdot \mathcal{L}_{BGAN}$. Experiments on Visual Genome, GQA, and VG-1800 show consistent improvements in Average@K across Motif, VCtree, and Transformer backbones, outperforming dataset-level correction methods by notable margins and demonstrating the method’s generalization to one-stage SGG and related long-tailed tasks. The results highlight the value of sample-specific, region-aware bias correction for enriching scene graph quality and tail relationship discovery.

Abstract

Scene Graph Generation (SGG) aims to explore the relationships between objects in images and obtain scene summary graphs, thereby better serving downstream tasks. However, the long-tailed problem has adversely affected the scene graph's quality. The predictions are dominated by coarse-grained relationships, lacking more informative fine-grained ones. The union region of one object pair (i.e., one sample) contains rich and dedicated contextual information, enabling the prediction of the sample-specific bias for refining the original relationship prediction. Therefore, we propose a novel Sample-Level Bias Prediction (SBP) method for fine-grained SGG (SBG). Firstly, we train a classic SGG model and construct a correction bias set by calculating the margin between the ground truth label and the predicted label with one classic SGG model. Then, we devise a Bias-Oriented Generative Adversarial Network (BGAN) that learns to predict the constructed correction biases, which can be utilized to correct the original predictions from coarse-grained relationships to fine-grained ones. The extensive experimental results on VG, GQA, and VG-1800 datasets demonstrate that our SBG outperforms the state-of-the-art methods in terms of Average@K across three mainstream SGG models: Motif, VCtree, and Transformer. Compared to dataset-level correction methods on VG, SBG shows a significant average improvement of 5.6%, 3.9%, and 3.2% on Average@K for tasks PredCls, SGCls, and SGDet, respectively. The code will be available at https://github.com/Zhuzi24/SBG.

Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction

TL;DR

This work tackles the long-tailed problem in Scene Graph Generation by introducing Sample-Level Bias Prediction (SBP), which predicts per-sample correction biases from the union-region context to refine coarse predictions into fine-grained relations. A Bias-Oriented Generative Adversarial Network (BGAN) is trained to approximate these sample-specific biases, with the corrected logits given by . The approach constructs a correction bias set from a baseline SGG model and jointly optimizes a classic SGG loss and the adversarial loss via . Experiments on Visual Genome, GQA, and VG-1800 show consistent improvements in Average@K across Motif, VCtree, and Transformer backbones, outperforming dataset-level correction methods by notable margins and demonstrating the method’s generalization to one-stage SGG and related long-tailed tasks. The results highlight the value of sample-specific, region-aware bias correction for enriching scene graph quality and tail relationship discovery.

Abstract

Scene Graph Generation (SGG) aims to explore the relationships between objects in images and obtain scene summary graphs, thereby better serving downstream tasks. However, the long-tailed problem has adversely affected the scene graph's quality. The predictions are dominated by coarse-grained relationships, lacking more informative fine-grained ones. The union region of one object pair (i.e., one sample) contains rich and dedicated contextual information, enabling the prediction of the sample-specific bias for refining the original relationship prediction. Therefore, we propose a novel Sample-Level Bias Prediction (SBP) method for fine-grained SGG (SBG). Firstly, we train a classic SGG model and construct a correction bias set by calculating the margin between the ground truth label and the predicted label with one classic SGG model. Then, we devise a Bias-Oriented Generative Adversarial Network (BGAN) that learns to predict the constructed correction biases, which can be utilized to correct the original predictions from coarse-grained relationships to fine-grained ones. The extensive experimental results on VG, GQA, and VG-1800 datasets demonstrate that our SBG outperforms the state-of-the-art methods in terms of Average@K across three mainstream SGG models: Motif, VCtree, and Transformer. Compared to dataset-level correction methods on VG, SBG shows a significant average improvement of 5.6%, 3.9%, and 3.2% on Average@K for tasks PredCls, SGCls, and SGDet, respectively. The code will be available at https://github.com/Zhuzi24/SBG.
Paper Structure (15 sections, 7 equations, 10 figures, 16 tables, 1 algorithm)

This paper contains 15 sections, 7 equations, 10 figures, 16 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) The long-tailed distribution of relationships in the well-known Visual Genome (VG) dataset vg. (b) Workflow for bias correction: a sample-specific correction bias is predicted by the contextual information from the union region of $<$man, beach$>$ and the original prediction, then it corrects the prediction from the coarse-grained "on" to the fine-grained "walking on". The y-axis label "classification score" in the figure represents the classification value before the softmax function.
  • Figure 2: Compare our SBG with DLFE HS1 and RTPB T3 for corrections. DLFE and RTPB apply the same $\mathbf{c}$ or $\mathbf{b}$ to correct all predictions while our SBG predicts the sample-specific bias for each prediction.
  • Figure 3: The overall structure of our SBG. After constructing the correction bias set, BGAN learns to predict the constructed correction biases for achieving the sample-level bias correction. The coarse-grained scene graph is generated by the classic SGG model. v is the value at "parked on" in $\mathbf{b}^{tru}$.
  • Figure 4: The construction workflow of Correction Bias Set $\mathcal{S}$.
  • Figure 5: Comparison with DLFE and RTPB on the overall performance.
  • ...and 5 more figures