A Causal Adjustment Module for Debiasing Scene Graph Generation
Li Liu, Shuzhou Sun, Shuaifeng Zhi, Fan Shi, Zhen Liu, Janne Heikkilä, Yongxiang Liu
TL;DR
The paper tackles bias in Scene Graph Generation by arguing that skew in object and object-pair distributions, not just relationship long-tail, underpins model bias. It develops a causal framework, MCCM, which introduces a mediator $\mathcal{C}$ representing co-occurrence to refine the causal path $\mathcal{O} \rightarrow \mathcal{C} \rightarrow \mathcal{P} \rightarrow \mathcal{R}$, and a lightweight CAModule that outputs triplet-level logit adjustments derived from these causal factors. CAModule enables zero-shot relationship composition and demonstrates state-of-the-art mean recall $\text{mR@K}$ across multiple backbones and datasets, with notable gains in zero-shot recall $\text{zR@K}$. The approach is model-agnostic and integrates as a lightweight post-hoc adjustment, offering practical debiasing with modest computational overhead and improved robustness across head and tail predicates.
Abstract
While recent debiasing methods for Scene Graph Generation (SGG) have shown impressive performance, these efforts often attribute model bias solely to the long-tail distribution of relationships, overlooking the more profound causes stemming from skewed object and object pair distributions. In this paper, we employ causal inference techniques to model the causality among these observed skewed distributions. Our insight lies in the ability of causal inference to capture the unobservable causal effects between complex distributions, which is crucial for tracing the roots of model bias. Specifically, we introduce the Mediator-based Causal Chain Model (MCCM), which, in addition to modeling causality among objects, object pairs, and relationships, incorporates mediator variables, i.e., cooccurrence distribution, for complementing the causality. Following this, we propose the Causal Adjustment Module (CAModule) to estimate the modeled causal structure, using variables from MCCM as inputs to produce a set of adjustment factors aimed at correcting biased model predictions. Moreover, our method enables the composition of zero-shot relationships, thereby enhancing the model's ability to recognize such relationships. Experiments conducted across various SGG backbones and popular benchmarks demonstrate that CAModule achieves state-of-the-art mean recall rates, with significant improvements also observed on the challenging zero-shot recall rate metric.
