Table of Contents
Fetching ...

A Causal Adjustment Module for Debiasing Scene Graph Generation

Li Liu, Shuzhou Sun, Shuaifeng Zhi, Fan Shi, Zhen Liu, Janne Heikkilä, Yongxiang Liu

TL;DR

The paper tackles bias in Scene Graph Generation by arguing that skew in object and object-pair distributions, not just relationship long-tail, underpins model bias. It develops a causal framework, MCCM, which introduces a mediator $\mathcal{C}$ representing co-occurrence to refine the causal path $\mathcal{O} \rightarrow \mathcal{C} \rightarrow \mathcal{P} \rightarrow \mathcal{R}$, and a lightweight CAModule that outputs triplet-level logit adjustments derived from these causal factors. CAModule enables zero-shot relationship composition and demonstrates state-of-the-art mean recall $\text{mR@K}$ across multiple backbones and datasets, with notable gains in zero-shot recall $\text{zR@K}$. The approach is model-agnostic and integrates as a lightweight post-hoc adjustment, offering practical debiasing with modest computational overhead and improved robustness across head and tail predicates.

Abstract

While recent debiasing methods for Scene Graph Generation (SGG) have shown impressive performance, these efforts often attribute model bias solely to the long-tail distribution of relationships, overlooking the more profound causes stemming from skewed object and object pair distributions. In this paper, we employ causal inference techniques to model the causality among these observed skewed distributions. Our insight lies in the ability of causal inference to capture the unobservable causal effects between complex distributions, which is crucial for tracing the roots of model bias. Specifically, we introduce the Mediator-based Causal Chain Model (MCCM), which, in addition to modeling causality among objects, object pairs, and relationships, incorporates mediator variables, i.e., cooccurrence distribution, for complementing the causality. Following this, we propose the Causal Adjustment Module (CAModule) to estimate the modeled causal structure, using variables from MCCM as inputs to produce a set of adjustment factors aimed at correcting biased model predictions. Moreover, our method enables the composition of zero-shot relationships, thereby enhancing the model's ability to recognize such relationships. Experiments conducted across various SGG backbones and popular benchmarks demonstrate that CAModule achieves state-of-the-art mean recall rates, with significant improvements also observed on the challenging zero-shot recall rate metric.

A Causal Adjustment Module for Debiasing Scene Graph Generation

TL;DR

The paper tackles bias in Scene Graph Generation by arguing that skew in object and object-pair distributions, not just relationship long-tail, underpins model bias. It develops a causal framework, MCCM, which introduces a mediator representing co-occurrence to refine the causal path , and a lightweight CAModule that outputs triplet-level logit adjustments derived from these causal factors. CAModule enables zero-shot relationship composition and demonstrates state-of-the-art mean recall across multiple backbones and datasets, with notable gains in zero-shot recall . The approach is model-agnostic and integrates as a lightweight post-hoc adjustment, offering practical debiasing with modest computational overhead and improved robustness across head and tail predicates.

Abstract

While recent debiasing methods for Scene Graph Generation (SGG) have shown impressive performance, these efforts often attribute model bias solely to the long-tail distribution of relationships, overlooking the more profound causes stemming from skewed object and object pair distributions. In this paper, we employ causal inference techniques to model the causality among these observed skewed distributions. Our insight lies in the ability of causal inference to capture the unobservable causal effects between complex distributions, which is crucial for tracing the roots of model bias. Specifically, we introduce the Mediator-based Causal Chain Model (MCCM), which, in addition to modeling causality among objects, object pairs, and relationships, incorporates mediator variables, i.e., cooccurrence distribution, for complementing the causality. Following this, we propose the Causal Adjustment Module (CAModule) to estimate the modeled causal structure, using variables from MCCM as inputs to produce a set of adjustment factors aimed at correcting biased model predictions. Moreover, our method enables the composition of zero-shot relationships, thereby enhancing the model's ability to recognize such relationships. Experiments conducted across various SGG backbones and popular benchmarks demonstrate that CAModule achieves state-of-the-art mean recall rates, with significant improvements also observed on the challenging zero-shot recall rate metric.

Paper Structure

This paper contains 15 sections, 41 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: The motivations of our proposed CAModule. (a): Distribution of 150 object categories. (b): Distribution of 50 relationship categories. (c): Distribution of object pairs. Theoretically, there could be 150×150 possible object pair combinations. Although the actual occurrence of object pairs is sparse, only a small number of pairs is displayed. (d): Distribution of object pairs within triplets formed by the "standing on" relationship. (e): Accuracy for different triplets across various relationships. Here, for a more intuitive display of the model's performance across diverse triplets, accuracy rather than recall rate is computed. Experiments in this figure are conducted on the VG150 VG150 dataset.
  • Figure 2: The typical pipeline of a biased scene graph generation model and the proposed Causal Adjustment Module (CAModule) of this paper. The SGG pipeline primarily comprises an object detector and a relationship classifier, where the former detects the categories and positional information of objects within the images, while the latter classifies the relationship features of each pair of objects. CAModule takes the object distribution, co-occurrence distribution, object pair distribution, and relationship distribution as inputs and outputs a set of fine-grained adjustment factors for adjusting the logits output by the biased SGG model.
  • Figure 3: Structural Causal Model (SCM) of typical scene graph generation framework (a) and our proposed method (b). $\mathcal{O}$, $\mathcal{C}$, $\mathcal{P}$, and $\mathcal{R}$ represent the distributions of object, co-occurrence, object pair, and relationship, respectively.
  • Figure 4: The co-occurrence distributions of two examples (man, dog). For simplicity, only the top 10 co-occurring objects are shown for each example. Solid circles represent the magnitude of co-occurrence, with larger circles indicating a greater likelihood of two objects appearing together in the same scene, and smaller circles suggesting a lower likelihood.
  • Figure 5: Comparison between relationship-level adjustment and triplet-level adjustment. , , and represent different triplets composed of the same relationship; for illustration, they could be $<$people, standing on, snow$>$, $<$cat, standing on, table$>$, and $<$bird, standing on, branch$>$. (a) shows predictions for the three different triplets , , . (b) illustrates relationship-level adjustment, where since these three triplets belong to the same relationship category, the adjustment direction for all instances is consistent. (c) depicts triplet-level adjustment, where in this finer-grained adjustment, the adjustment direction differs across the different triplets.
  • ...and 5 more figures