Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai, Nik Bear Brown
TL;DR
This paper addresses the challenge of achieving robust, generalizable autonomy in robotics by surveying how Transformers are deployed across perception, planning, and control. It analyzes three main deployment patterns: foundation-model–driven HRI/HRC, Transformer-augmented DRL for long-horizon tasks, and perception/planning/control pipelines enhanced by self-attention and multimodal fusion, including efficient and sparse variants. Key contributions include a taxonomy of architectural innovations (efficient, multimodal, sparse/adaptive), a synthesis of foundational models and DRL integrations (e.g., Decision Transformer, Trajectory Transformer, TransformerMPC, OpenX-Embodiment datasets), and a discussion of challenges such as sim-to-real transfer, safety, and real-time constraints. The work highlights the practical impact of transformers in enabling zero-shot and few-shot generalization, improved sample efficiency, and scalable perception–planning–control, while charting directions for edge deployment, cross-embodiment generalization, and richer, real-world robotics datasets. $O(n^2)$ attention in vanilla transformers motivates the push toward linear or sparse alternatives for real-time robotic systems.
Abstract
The introduction of Transformers architecture has brought about significant breakthroughs in Deep Learning (DL), particularly within Natural Language Processing (NLP). Since their inception, Transformers have outperformed many traditional neural network architectures due to their "self-attention" mechanism and their scalability across various applications. In this paper, we cover the use of Transformers in Robotics. We go through recent advances and trends in Transformer architectures and examine their integration into robotic perception, planning, and control for autonomous systems. Furthermore, we review past work and recent research on use of Transformers in Robotics as pre-trained foundation models and integration of Transformers with Deep Reinforcement Learning (DRL) for autonomous systems. We discuss how different Transformer variants are being adapted in robotics for reliable planning and perception, increasing human-robot interaction, long-horizon decision-making, and generalization. Finally, we address limitations and challenges, offering insight and suggestions for future research directions.
