Towards Scene Graph Anticipation
Rohith Peddi, Saksham Singh, Saurabh, Parag Singla, Vibhav Gogate
TL;DR
This paper introduces Scene Graph Anticipation (SGA), a task to forecast future fine-grained object relations in video-based scene graphs. It presents SceneSayer, a continuous-time framework with Object Representation Processing Unit, Spatial Context Processing Unit, and Latent Dynamics Processing Unit to model the evolution of object interactions via NeuralODEs and NeuralSDEs. Through extensive experiments on Action Genome, SceneSayer (especially the SDE variant) yields significant gains in long-horizon relation anticipation across AGS/PGAGS/GAGS settings, with ablations highlighting the benefits of stochastic dynamics modeling and Stratonovich interpretation. The work advances anticipatory scene understanding with potential impact on video surveillance, robotics, and autonomous systems by providing robust, uncertainty-aware relational forecasts beyond 30 seconds into the future.
Abstract
Spatio-temporal scene graphs represent interactions in a video by decomposing scenes into individual objects and their pair-wise temporal relationships. Long-term anticipation of the fine-grained pair-wise relationships between objects is a challenging problem. To this end, we introduce the task of Scene Graph Anticipation (SGA). We adapt state-of-the-art scene graph generation methods as baselines to anticipate future pair-wise relationships between objects and propose a novel approach SceneSayer. In SceneSayer, we leverage object-centric representations of relationships to reason about the observed video frames and model the evolution of relationships between objects. We take a continuous time perspective and model the latent dynamics of the evolution of object interactions using concepts of NeuralODE and NeuralSDE, respectively. We infer representations of future relationships by solving an Ordinary Differential Equation and a Stochastic Differential Equation, respectively. Extensive experimentation on the Action Genome dataset validates the efficacy of the proposed methods.
