Table of Contents
Fetching ...

On the Design Space Between Transformers and Recursive Neural Nets

Jishnu Ray Chowdhury, Cornelia Caragea

TL;DR

This work formalizes a unifying recursive framework that links Recursive Neural Networks (RvNNs) and Transformers via two bridge models: Neural Data Router (NDR) and Continuous Recursive Neural Networks (CRvNN). By expressing both approaches as recursive applications Rec(H^t,E^t) with Retrieve and Compose components, the authors show structural equivalence in parts of their mechanics (geometric attention vs. neighbor retrieval and gating) while highlighting crucial differences in attention heads, masking, and dynamic halting. Empirically, CRvNN demonstrates strong out-of-distribution generalization on algorithmic tasks like ListOps and Logical Inference, often outperforming NDR and vanilla Transformers, while NDR offers flexible local attention and gating but struggles with long-length generalization and requires prior depth settings. The paper concludes that the space between RvNNs and Transformers hosts a trade-off between inductive bias and flexibility, proposing future work on dynamic halting, adaptive recursion, and memory efficiency to further exploit these bridge models. The insights advance understanding of how to design architectures capable of robust, scalable algorithmic reasoning beyond standard fixed-depth Transformers.

Abstract

In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR). On one hand, CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ends up with a Transformer-like structure. On the other hand, NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN. Both models, CRvNN and NDR, show strong performance in algorithmic tasks and generalization in which simpler forms of RvNNs and Transformers fail. We explore these "bridge" models in the design space between RvNNs and Transformers, formalize their tight connections, discuss their limitations, and propose ideas for future research.

On the Design Space Between Transformers and Recursive Neural Nets

TL;DR

This work formalizes a unifying recursive framework that links Recursive Neural Networks (RvNNs) and Transformers via two bridge models: Neural Data Router (NDR) and Continuous Recursive Neural Networks (CRvNN). By expressing both approaches as recursive applications Rec(H^t,E^t) with Retrieve and Compose components, the authors show structural equivalence in parts of their mechanics (geometric attention vs. neighbor retrieval and gating) while highlighting crucial differences in attention heads, masking, and dynamic halting. Empirically, CRvNN demonstrates strong out-of-distribution generalization on algorithmic tasks like ListOps and Logical Inference, often outperforming NDR and vanilla Transformers, while NDR offers flexible local attention and gating but struggles with long-length generalization and requires prior depth settings. The paper concludes that the space between RvNNs and Transformers hosts a trade-off between inductive bias and flexibility, proposing future work on dynamic halting, adaptive recursion, and memory efficiency to further exploit these bridge models. The insights advance understanding of how to design architectures capable of robust, scalable algorithmic reasoning beyond standard fixed-depth Transformers.

Abstract

In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR). On one hand, CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ends up with a Transformer-like structure. On the other hand, NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN. Both models, CRvNN and NDR, show strong performance in algorithmic tasks and generalization in which simpler forms of RvNNs and Transformers fail. We explore these "bridge" models in the design space between RvNNs and Transformers, formalize their tight connections, discuss their limitations, and propose ideas for future research.
Paper Structure (10 sections, 17 equations, 3 tables)