Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

Awni Altabaa; Taylor Webb; Jonathan Cohen; John Lafferty

Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

Awni Altabaa, Taylor Webb, Jonathan Cohen, John Lafferty

TL;DR

An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor, which is a variant of attention called relational cross-attention that disentangles relational information from object-level features.

Abstract

An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where consistent improvements in performance and sample efficiency are observed.

Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

TL;DR

Abstract

Paper Structure (28 sections, 1 theorem, 16 equations, 8 figures, 2 algorithms)

This paper contains 28 sections, 1 theorem, 16 equations, 8 figures, 2 algorithms.

Introduction
Relational cross-attention and the abstractor module
Modeling relations as inner products
Relational Cross-Attention
Symbol assignment mechanisms
The Abstractor module
Abstractor architectures
Experiments
Discriminative relational tasks
Order relations: modeling asymmetric relations.
SET: modeling multi-dimensional relations.
SET (continued): comparison to "neuro-symbolic" model.
Object-sorting: purely relational sequence-to-sequence tasks
Superior sample-efficiency on relational seq2seq tasks.
Ability to generalize to similar tasks.
...and 13 more sections

Key Result

Theorem 1

Suppose ${\mathcal{X}}$ is a compact Euclidean space. Let $r: {\mathcal{X}} \times {\mathcal{X}} \to \mathbb{R}$ be any continuous relation function, and $f: \mathbb{R}^n \to \mathbb{R}^d$ any continuous function. Consider the function $g: {\mathcal{X}}^n \to \mathbb{R}^{n \times d}$ defined by where $R_i = {(r(x_i, x_j))}_{j \in [n]}$ is the vector of $x_i$'s relations with the other objects in

Figures (8)

Figure 1: Comparison of relational cross-attention with self-attention. Red represents object-level features, blue represents relational features, and purple represents mixed representations. Relational cross-attention computes relational information disentangled from the features of individual objects.
Figure 2: Examples of Abstractor-based model architectures.
Figure 3: The SET game
Figure 4: Experiments on discriminative relational tasks and comparison to CoRelNet.
Figure 5: Experiments on object-sorting, a purely relational sequence-to-sequence task.
...and 3 more figures

Theorems & Definitions (3)

Remark 1
Theorem 1
proof

Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

TL;DR

Abstract

Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)