Table of Contents
Fetching ...

A Faster and More Reliable Middleware for Autonomous Driving Systems

Yuankai He, Weisong Shi

TL;DR

The paper tackles the critical problem of perception-to-decision latency in high-speed autonomous vehicles by addressing intra-host messaging bottlenecks. It introduces Sensor-in-Memory (SIM), a native-layout, shared-memory transport with lock-free, double-buffered data planes that bypasses (de)serialization while maintaining ROS 2 compatibility. Across Jetson Orin Nano and a production vehicle, SIM achieves substantial reductions in transport latency and tail latency (up to ~98% max and ~95% mean improvements) and improves application throughput in Autoware.Universe (e.g., localization frequency rising from 7.5 Hz to 9.5 Hz) with a notable end-to-end latency drop from about 522 ms to 290 ms. The results demonstrate that SIM can significantly enhance perception-to-control responsiveness and safety margins, while preserving the open-source ROS 2 ecosystem and enabling incremental deployment along intra-host paths.

Abstract

Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).

A Faster and More Reliable Middleware for Autonomous Driving Systems

TL;DR

The paper tackles the critical problem of perception-to-decision latency in high-speed autonomous vehicles by addressing intra-host messaging bottlenecks. It introduces Sensor-in-Memory (SIM), a native-layout, shared-memory transport with lock-free, double-buffered data planes that bypasses (de)serialization while maintaining ROS 2 compatibility. Across Jetson Orin Nano and a production vehicle, SIM achieves substantial reductions in transport latency and tail latency (up to ~98% max and ~95% mean improvements) and improves application throughput in Autoware.Universe (e.g., localization frequency rising from 7.5 Hz to 9.5 Hz) with a notable end-to-end latency drop from about 522 ms to 290 ms. The results demonstrate that SIM can significantly enhance perception-to-control responsiveness and safety margins, while preserving the open-source ROS 2 ecosystem and enabling incremental deployment along intra-host paths.

Abstract

Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).

Paper Structure

This paper contains 47 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Top: Data Flow and the Role of DDS in an Autonomous Vehicle Bottom: SIM Overview
  • Figure 2:
  • Figure 3: LiDAR latency (1W$\rightarrow$1R, linear-scale), Orin Nano. Whiskers show min–max; filled dot is the mean; long tick is p95; short dashed tick is p99. Transport-only; best-effort QoS; zero-copy enabled; pinned threads; fixed clocks; No scheduling priority elevation. Mean (95% CI across runs, $n=5$): SIM 0.226 [0.219, 0.234]; Fast DDS 0.685 [0.650, 0.720]; Zenoh 0.839 [0.773, 0.905]. When scheduling priority is set to 99 with SCHED_FIFO, SIM's mean latency decreases to 0.158 [0.150, 0.164].
  • Figure 4: LiDAR latency (1W$\rightarrow$10R, linear-scale), Orin Nano. Whiskers show min–max; filled dot is the mean; long tick is p95; short dashed tick is p99. Transport-only; best-effort QoS; zero-copy enabled; pinned threads; fixed clocks; No scheduling priority elevation. Mean (95% CI across runs, $n=5$): SIM 0.542 [0.513, 0.570]; Fast DDS 1.593 [1.583, 1.603]; Zenoh 2.136 [2.077, 2.196]. When scheduling priority is set to 99 with SCHED_FIFO, SIM's mean latency decreases to 0.231 [0.215, 0.246].
  • Figure 5: Camera latency (1W$\rightarrow$1R, log-scale), Orin Nano. Whiskers show min–max; filled dot is the mean; long tick is p95; short dashed tick is p99. Transport-only; best-effort QoS; zero-copy enabled; pinned threads; fixed clocks; No scheduling priority elevation. Mean (95% CI across runs, $n=5$): SIM 0.253 [0.244, 0.261]; Fast DDS 3.40 [2.881, 3.920]; Zenoh 3.984 [3.877, 4.091]. When scheduling priority is set to 99 with SCHED_FIFO, SIM's mean latency decreases to 0.204 [0.195, 0.212].
  • ...and 2 more figures