ROS 2 on a Chip, Achieving Brain-Like Speeds and Efficiency in Robotic Networking
Víctor Mayoral-Vilches, Juan Manuel Reina-Muñoz, Martiño Crespo-Álvarez, David Mayoral-Vilches
TL;DR
The paper addresses latency and energy inefficiency in ROS 2 networking for real-time robotics by presenting ROS 2 on a Chip, a hardware-accelerated implementation built on FPGA. The ROBOTCORE architecture maps ROS 2 abstractions, RTPS, and UDP/IP into hardware with DDS interoperability, creating deterministic datapaths that bypass software overhead. Key results show mean latencies near $2\,\mu s$ and maximum latencies as low as $11\,\mu s$, with up to >$30{,}000\times$ improvements in peak latency and >$500\times$ energy efficiency over CPU-based ROS 2. These findings demonstrate brain-like, deterministic robotic networking capabilities and motivate broader adoption of hardware-accelerated ROS in both industrial and teleoperation contexts, while outlining future work to broaden benchmarking, scalability, and security aspects.
Abstract
The Robot Operating System (ROS) pubsub model played a pivotal role in developing sophisticated robotic applications. However, the complexities and real-time demands of modern robotics necessitate more efficient communication solutions that are deterministic and isochronous. This article introduces a groundbreaking approach: embedding ROS 2 message-passing infrastructure directly onto a specialized hardware chip, significantly enhancing speed and efficiency in robotic communications. Our FPGA prototypes of the chip design can send or receive packages in less than 2.5 microseconds, accelerating networking communications by more than 62x on average and improving energy consumption by more than 500x when compared to traditional ROS 2 software implementations on modern CPUs. Additionally, it dramatically reduces maximum latency in ROS 2 networking communication by more than 30,000x. In situations of peak latency, our design guarantees an isochronous response within 11 microseconds, a stark improvement over the potential hundreds of milliseconds reported by modern CPU systems under similar conditions.
