TreeCoders: Trees of Transformers

Pierre Colonna D'Istria; Abdulrahman Altahhan

TreeCoders: Trees of Transformers

Pierre Colonna D'Istria, Abdulrahman Altahhan

TL;DR

TreeCoders, a novel family of transformer trees moved away from traditional linear transformers to complete k-ary trees, demonstrates that the proposed tree transformer model outperforms a size-equivalent linear transformer model 76% of the time over a wide range of tree architectures.

Abstract

In this paper, we introduce TreeCoders, a novel family of transformer trees. We moved away from traditional linear transformers to complete k-ary trees. Transformer blocks serve as nodes, and generic classifiers learn to select the best child and route the sequence of tokens to a specific leaf. The selectors, moved outside the transformer blocks, allow for the use of a variety of architecture without further modifications. Furthermore, our proposed architecture supports sparse node activation due to the logarithmic complexity of a tree search. We validate our idea by testing a series of decoder-only tree transformers, achieving competitive results across a diverse range of language datasets. Our study demonstrates that the proposed tree transformer model outperforms a size-equivalent linear transformer model 76\% of the time over a wide range of tree architectures. Furthermore, our proposed model naturally lends itself to distributed implementation.

TreeCoders: Trees of Transformers

TL;DR

Abstract

TreeCoders: Trees of Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)

Theorems & Definitions (4)