RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

Hae-Won Jo; Yeong-Jun Cho

RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

Hae-Won Jo, Yeong-Jun Cho

TL;DR

RS-Net tackles two core challenges in dynamic scene graph generation: the lack of supervision for non-annotated object pairs and the insufficiency of short temporal glimpses. It introduces a modular framework with a spatial context encoder, a temporal context encoder, and a relation scoring decoder to evaluate the contextual relevance of each object pair across an entire video. By integrating a video-level temporal context token into relation representations and multiplying RS-Net’s context score with traditional triplet scores, RS-Net consistently improves Recall, Precision, and mean Recall across diverse DSGG baselines on ActionGenome, while maintaining competitive efficiency. This approach offers a practical, generalizable means to enhance relational reasoning in dynamic scenes, enabling better predicate ranking and more accurate scene graphs in real-world video understanding.

Abstract

Dynamic Scene Graph Generation (DSGG) models how object relations evolve over time in videos. However, existing methods are trained only on annotated object pairs and lack guidance for non-related pairs, making it difficult to identify meaningful relations during inference. In this paper, we propose Relation Scoring Network (RS-Net), a modular framework that scores the contextual importance of object pairs using both spatial interactions and long-range temporal context. RS-Net consists of a spatial context encoder with learnable context tokens and a temporal encoder that aggregates video-level information. The resulting relation scores are integrated into a unified triplet scoring mechanism to enhance relation prediction. RS-Net can be easily integrated into existing DSGG models without architectural changes. Experiments on the Action Genome dataset show that RS-Net consistently improves both Recall and Precision across diverse baselines, with notable gains in mean Recall, highlighting its ability to address the long-tailed distribution of relations. Despite the increased number of parameters, RS-Net maintains competitive efficiency, achieving superior performance over state-of-the-art methods.

RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

TL;DR

Abstract

RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)