Discovering objects and their relations from entangled scene representations

David Raposo; Adam Santoro; David Barrett; Razvan Pascanu; Timothy Lillicrap; Peter Battaglia

Discovering objects and their relations from entangled scene representations

David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, Peter Battaglia

TL;DR

The paper addresses how to learn object relations from entangled scene representations and proposes Relation Networks (RNs) that compute pairwise object relations with a shared function and permutation-invariant aggregation. The authors demonstrate that RNs excel at identifying relational structure, can induce factorized object representations from entangled inputs (including pixel-based inputs via a VAE), and can support one-shot relational learning when combined with a memory-augmented network. Key contributions include strong supervised relational reasoning performance, relational disentanglement from entangled inputs, and scalable integration with perceptual and memory modules for rapid generalization. The work suggests a broadly applicable, data-efficient architecture for object-relational reasoning across domains and modalities.

Abstract

Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and their relations for learning a wide spectrum of tasks, and more generally when learning the structure underlying observed data. In this work, we introduce relation networks (RNs) - a general purpose neural network architecture for object-relation reasoning. We show that RNs are capable of learning object relations from scene description data. Furthermore, we show that RNs can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. The model can also be used in conjunction with differentiable memory mechanisms for implicit relation discovery in one-shot learning tasks. Our results suggest that relation networks are a potentially powerful architecture for solving a variety of problems that require object relation reasoning.

Discovering objects and their relations from entangled scene representations

TL;DR

Abstract

Discovering objects and their relations from entangled scene representations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)