Relational Object-Centric Actor-Critic
Leonid Ugadiarov, Vitaliy Vorobyov, Aleksandr I. Panov
TL;DR
This work introduces Relational Object-Centric Actor-Critic (ROCA), an off-policy, value-based model-based RL algorithm that embeds a graph-based, object-centric world model inside the critic. ROCA uses a pre-trained SLATE object encoder to produce slot representations $z_t$ that feed a graph neural network for transition, reward, and value prediction, enabling planning-like reasoning within an SAC-style framework. Empirical results on 3D CausalWorld Object Reaching and 2D Shapes2D tasks show that ROCA achieves superior sample efficiency and performance in challenging multi-object scenarios, outperforming object-centric model-free baselines and DreamerV3 variants; ROCA-CSWM highlights challenges when integrating contrastive world-model training online. The work demonstrates that graph-based object-centric dynamics can effectively support policy learning, with limitations including deterministic dynamics and entropy sensitivity, and points to future work replacing SLATE with more powerful slot-based encoders to handle more visually complex environments.
Abstract
The advances in unsupervised object-centric representation learning have significantly improved its application to downstream tasks. Recent works highlight that disentangled object representations can aid policy learning in image-based, object-centric reinforcement learning tasks. This paper proposes a novel object-centric reinforcement learning algorithm that integrates actor-critic and model-based approaches by incorporating an object-centric world model within the critic. The world model captures the environment's data-generating process by predicting the next state and reward given the current state-action pair, where actions are interventions in the environment. In model-based reinforcement learning, world model learning can be interpreted as a causal induction problem, where the agent must learn the causal relationships underlying the environment's dynamics. We evaluate our method in a simulated 3D robotic environment and a 2D environment with compositional structure. As baselines, we compare against object-centric, model-free actor-critic algorithms and a state-of-the-art monolithic model-based algorithm. While the baselines show comparable performance in easier tasks, our approach outperforms them in more challenging scenarios with a large number of objects or more complex dynamics.
