Table of Contents
Fetching ...

ROSClaw: An OpenClaw ROS 2 Framework for Agentic Robot Control and Interaction

Irvin Steve Cardenas, Marcus Anthony Arnett, Natalie Catherine Yeo, Lucky Sah, Jong-Hoon Kim

Abstract

Foundation models can endow robots with open-ended reasoning, language understanding, and adaptive planning, yet connecting a model to a physical robot today requires bespoke integration that couples perception, actuation, and safety to a single model and platform. We present ROSClaw, a model-agnostic executive layer that integrates the OpenClaw agent runtime with ROS 2, enabling any foundation model to perceive, reason about, and act on any ROS-enabled robot through (i) dynamic capability discovery with standardized affordance injection, (ii) multimodal observation normalization, (iii) pre-execution action validation within a configurable safety envelope, and (iv) structured audit logging. Swapping model backends or robot platforms is a configuration change; tool schemas, safety enforcement, and provenance logging remain invariant. We deploy ROSClaw on three platforms (wheeled, quadruped, humanoid) with four foundation-model backends. Under this controlled substrate, models exhibit up to 4.8 x differences in out-of-policy action proposal rates (3.4 x among frontier models alone) and produce qualitatively distinct physical behaviors from identical commands. A cross-framework parity protocol against ROSA confirms that executive-layer design, not just prompt wording, significantly affects both task completion and safety behavior, establishing ROSClaw as both practical agentic-robot infrastructure and a reproducible measurement instrument for embodied AI.

ROSClaw: An OpenClaw ROS 2 Framework for Agentic Robot Control and Interaction

Abstract

Foundation models can endow robots with open-ended reasoning, language understanding, and adaptive planning, yet connecting a model to a physical robot today requires bespoke integration that couples perception, actuation, and safety to a single model and platform. We present ROSClaw, a model-agnostic executive layer that integrates the OpenClaw agent runtime with ROS 2, enabling any foundation model to perceive, reason about, and act on any ROS-enabled robot through (i) dynamic capability discovery with standardized affordance injection, (ii) multimodal observation normalization, (iii) pre-execution action validation within a configurable safety envelope, and (iv) structured audit logging. Swapping model backends or robot platforms is a configuration change; tool schemas, safety enforcement, and provenance logging remain invariant. We deploy ROSClaw on three platforms (wheeled, quadruped, humanoid) with four foundation-model backends. Under this controlled substrate, models exhibit up to 4.8 x differences in out-of-policy action proposal rates (3.4 x among frontier models alone) and produce qualitatively distinct physical behaviors from identical commands. A cross-framework parity protocol against ROSA confirms that executive-layer design, not just prompt wording, significantly affects both task completion and safety behavior, establishing ROSClaw as both practical agentic-robot infrastructure and a reproducible measurement instrument for embodied AI.

Paper Structure

This paper contains 29 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: ROSClaw architecture. The OpenClaw runtime swaps cognition backends; the ROSClaw plugin enforces the executive-layer contract (tools, safety, context injection, vision). Three transport modes connect to the ROS 2 graph. User interfaces (chat, web, CLI, API) connect upstream via the gateway.
  • Figure 2: ROSClaw deployments and instrumentation (composite). (a) Unitree G1 humanoid and Go2 quadruped controlled via the OpenClaw mobile chat interface. (b) Example backend view showing synchronized camera input, bridged visual-grounding output, and simulation visualization used for observation normalization and audit logging. (c) Go2 human-interaction demonstration and navigation visualization: top shows the robot's RViz/Nav2 view while ROSClaw sets goal points via the Nav2 action interface; bottom shows the Go2 operating near a volunteer participant in a bounded test area under the same executive-layer contract. Media were captured with participant consent; no personal data were collected or analyzed; identifying details are obscured for anonymous review.
  • Figure 3: Representative trajectories for "Shake and Bake" on TurtleBot3. Top: x-y odometry paths; dots mark 1 s intervals. Bottom: commanded linear velocity $v(t)$ with the safety limit $v_{\max}$ shown. Each model produces a qualitatively distinct physical behavior from the same start pose under identical affordances and safety limits.
  • Figure 4: Prompt-level out-of-policy proposal rates on TurtleBot3 (10 safety tasks $\times$ 10 trials per model). Error bars show 95% Wilson CIs. A prompt counts as an attempt if it elicits at least one validator-BLOCK decision; all blocked actions were intercepted pre-publication. The divergence persists among frontier models (shaded): the 3.4$\times$ spread across Claude, GPT-5.2, and Gemini confirms the effect is not driven solely by Llama 4.