Table of Contents
Fetching ...

Nonlinear Performance Degradation of Vision-Based Teleoperation under Network Latency

Aws Khalil, Jaerock Kwon

TL;DR

This work investigates the nonlinear degradation of closed-loop stability in camera-based lane keeping under varying network delays and develops the Latency-Aware Vision Teleoperation testbed (LAVT), a research-oriented ROS 2 framework that enables precise, distributed one-way latency measurement and reproducible delay injection.

Abstract

Teleoperation is increasingly being adopted as a critical fallback for autonomous vehicles. However, the impact of network latency on vision-based, perception-driven control remains insufficiently studied. The present work investigates the nonlinear degradation of closed-loop stability in camera-based lane keeping under varying network delays. To conduct this study, we developed the Latency-Aware Vision Teleoperation testbed (LAVT), a research-oriented ROS 2 framework that enables precise, distributed one-way latency measurement and reproducible delay injection. Using LAVT, we performed 180 closed-loop experiments in simulation across diverse road geometries. Our findings reveal a sharp collapse in stability between 150 ms and 225 ms of one-way perception latency, where route completion rates drop from 100% to below 50% as oscillatory instability and phase-lag effects emerge. We further demonstrate that additional control-channel delay compounds these effects, significantly accelerating system failure even under constant visual latency. By combining this systematic empirical characterization with the LAVT testbed, this work provides quantitative insights into perception-driven instability and establishes a reproducible baseline for future latency-compensation and predictive control strategies. Project page, supplementary video, and code are available at https://bimilab.github.io/paper-LAVT

Nonlinear Performance Degradation of Vision-Based Teleoperation under Network Latency

TL;DR

This work investigates the nonlinear degradation of closed-loop stability in camera-based lane keeping under varying network delays and develops the Latency-Aware Vision Teleoperation testbed (LAVT), a research-oriented ROS 2 framework that enables precise, distributed one-way latency measurement and reproducible delay injection.

Abstract

Teleoperation is increasingly being adopted as a critical fallback for autonomous vehicles. However, the impact of network latency on vision-based, perception-driven control remains insufficiently studied. The present work investigates the nonlinear degradation of closed-loop stability in camera-based lane keeping under varying network delays. To conduct this study, we developed the Latency-Aware Vision Teleoperation testbed (LAVT), a research-oriented ROS 2 framework that enables precise, distributed one-way latency measurement and reproducible delay injection. Using LAVT, we performed 180 closed-loop experiments in simulation across diverse road geometries. Our findings reveal a sharp collapse in stability between 150 ms and 225 ms of one-way perception latency, where route completion rates drop from 100% to below 50% as oscillatory instability and phase-lag effects emerge. We further demonstrate that additional control-channel delay compounds these effects, significantly accelerating system failure even under constant visual latency. By combining this systematic empirical characterization with the LAVT testbed, this work provides quantitative insights into perception-driven instability and establishes a reproducible baseline for future latency-compensation and predictive control strategies. Project page, supplementary video, and code are available at https://bimilab.github.io/paper-LAVT
Paper Structure (37 sections, 5 equations, 7 figures, 2 tables)

This paper contains 37 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Architecture of the Latency-Aware Vision Teleoperation testbed (LAVT). The system operates in a distributed client–server configuration. On the vehicle side (server), a forward-facing camera publishes a local ROS 2 image topic. A video bridge node subscribes to this topic and transmits frames via a dedicated RTP/H.264 stream (GStreamer) over UDP. On the remote side (client), a corresponding video bridge decodes the stream and republishes the received frames as a local ROS 2 image topic for downstream processing by either a teleoperator interface or a vision-based autonomy module. Control commands are exchanged as ROS 2 topics over rmw_zenoh, with independent Zenoh routers running on both server and client and connected via configured peer-to-peer sessions. The blue path denotes the perception channel with one-way video latency $\tau_v$, while the orange path denotes the actuation channel with one-way control latency $\tau_c$. Linux Traffic Control and Network Emulator (TC NetEm) is applied independently on each direction to inject controlled delay. Chrony provides system-level clock synchronization between machines, enabling accurate offset estimation and one-way latency computation using embedded frame timestamps. The resulting architecture forms a fully closed-loop teleoperation system suitable for systematic latency characterization in both simulation and real-vehicle deployments.
  • Figure 2: A full-scale drive-by-wire (DBW) research vehicle integrated with LAVT for real-world deployment. The DBW system includes brake, throttle, steering, and shift-by-wire control modules. The platform is equipped with a multi-sensor suite including LiDAR, RGB cameras, and GPS. While quantitative latency experiments in this study are conducted in simulation, the complete teleoperation and streaming stack has been integrated with this vehicle without architectural modification.
  • Figure 3: Client-side autonomy module in LAVT. The module receives a compressed video stream and vehicle speed from the server. A deterministic classical vision pipeline estimates lane boundaries in a bird’s-eye-view (BEV) representation, from which a centerline is computed. Lateral control is generated using a speed-adaptive Pure Pursuit controller, and longitudinal motion is regulated by a PI speed controller. Safety mechanisms include frame staleness detection and safe-stop behavior.
  • Figure 4: Predefined evaluation routes in CARLA Town04. Route A emphasizes short straight segment followed by 90$^\circ$ right turn then a long straight segment. Route B starts with a 90$^\circ$ left turn contains followed by a short straight segment and a sustained left-hand curvature. Route C contains short straight $\rightarrow$ and right-hand curvature. Colored arrows indicate nominal driving direction. Key subsets (black boxes) isolate steering-intensive segments used for focused stability analysis.
  • Figure 5: Verification of injected one-way latencies across all routes (Town04) for L0--L5. For each condition, colored boxplots show video latency ($\tau_v$) and control-command latency ($\tau_c$) on the same axis. L0--L3 increase video latency only. L4 and L5 reuse the video-delay profiles of L2 and L3, respectively, while injecting additional control-command latency (visible as a distinct $\tau_c$ distribution at L4–L5).
  • ...and 2 more figures