Freely Long-Thinking Transformer (FraiLT)

Akbay Tabak

Freely Long-Thinking Transformer (FraiLT)

Akbay Tabak

TL;DR

FraiLT tackles the challenge of scaling language models by enabling extended processing through recursive reuse of a subset of layers, guided by learnable iteration encodings. It introduces a decoder-only transformer where iteration-aware blocks and groups revisit inputs across multiple passes, formalized with $X_i = X + E^{iter}(i)$ and $X^{(m, l)} = B_l(X^{(m, l-1)} + E^{iter}_{l}(m))$. The approach is evaluated on the TinyStories dataset using GPT-4-based evaluation, showing FraiLT can match or approach the performance of larger models while using fewer layers and less memory. These results suggest practical pathways to more accessible, efficient language models and motivate further exploration of iteration strategies and encodings in smaller architectures.

Abstract

Freely Long-Thinking Transformer (FraiLT) is an improved transformer model designed to enhance processing capabilities without scaling up size. It utilizes a recursive approach, iterating over a subset of layers multiple times, and introduces iteration encodings to maintain awareness across these cycles. Iteration encoding allows FraiLT to achieve the interpretive depth of larger models in a compact form. When evaluated on a synthetic story dataset, FraiLT outperformed larger models, showcasing its ability to deliver high-quality performance while reducing memory demands. This model represents a step forward towards more efficient and accessible language models.

Freely Long-Thinking Transformer (FraiLT)

TL;DR

and

. The approach is evaluated on the TinyStories dataset using GPT-4-based evaluation, showing FraiLT can match or approach the performance of larger models while using fewer layers and less memory. These results suggest practical pathways to more accessible, efficient language models and motivate further exploration of iteration strategies and encodings in smaller architectures.

Abstract

Paper Structure (15 sections, 2 equations, 7 figures, 5 tables)

This paper contains 15 sections, 2 equations, 7 figures, 5 tables.

Introduction
Model Architecture
FraiLT Block:
FraiLT Group:
FraiLT Transformer:
Related Work on Weight Sharing in Transformer Models
Methodology
Model Setup
Training Procedure
GPT4-Based Evaluation
Results
Validation Loss
GPT-4 Evaluation
Discussions & Conclusions
Follow-Up Work

Figures (7)

Figure 1: FraiLT Block
Figure 2: FraiLT Group
Figure 3: FraiLT Transformer
Figure 4: The final validation loss for each model across different embedding dimensions for 1 and 2-layer models
Figure 5: The final validation loss for each model across different embedding dimensions for 4 and 8-layer models
...and 2 more figures

Freely Long-Thinking Transformer (FraiLT)

TL;DR

Abstract

Freely Long-Thinking Transformer (FraiLT)

Authors

TL;DR

Abstract

Table of Contents

Figures (7)