Table of Contents
Fetching ...

ARRC: Advanced Reasoning Robot Control - Knowledge-Driven Autonomous Manipulation Using Retrieval-Augmented Generation

Eugene Vorobiov, Ammar Jaleel Mahmood, Salim Rezvani, Robin Chhabra

TL;DR

The paper tackles translating natural-language robot instructions into safe, local robot control. It introduces ARRC, a retrieval-augmented planning pipeline that uses an external knowledge base and RGB-D perception to generate JSON-formatted action plans executed on a real robot with safety gates. Key contributions include a structured robot-centric knowledge base with movement primitives and safety heuristics, a RAG prompting framework producing actionable plans, and a reproducible evaluation protocol on tabletop tasks. Results show that RAG-based planning enhances plan validity and adaptability while keeping perception and control local, enabling robust, safe manipulation with modest latency.

Abstract

We present ARRC (Advanced Reasoning Robot Control), a practical system that connects natural-language instructions to safe local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB-D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to produce JSON-structured action plans. Plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTag detections fused with depth to produce object-centric metric poses. Execution is enforced via software safety gates: workspace bounds, speed and force caps, timeouts, and bounded retries. We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan, approach, and pick-place tasks. Experimental results demonstrate the efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.

ARRC: Advanced Reasoning Robot Control - Knowledge-Driven Autonomous Manipulation Using Retrieval-Augmented Generation

TL;DR

The paper tackles translating natural-language robot instructions into safe, local robot control. It introduces ARRC, a retrieval-augmented planning pipeline that uses an external knowledge base and RGB-D perception to generate JSON-formatted action plans executed on a real robot with safety gates. Key contributions include a structured robot-centric knowledge base with movement primitives and safety heuristics, a RAG prompting framework producing actionable plans, and a reproducible evaluation protocol on tabletop tasks. Results show that RAG-based planning enhances plan validity and adaptability while keeping perception and control local, enabling robust, safe manipulation with modest latency.

Abstract

We present ARRC (Advanced Reasoning Robot Control), a practical system that connects natural-language instructions to safe local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB-D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to produce JSON-structured action plans. Plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTag detections fused with depth to produce object-centric metric poses. Execution is enforced via software safety gates: workspace bounds, speed and force caps, timeouts, and bounded retries. We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan, approach, and pick-place tasks. Experimental results demonstrate the efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.

Paper Structure

This paper contains 27 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: High-level architecture. Perception produces object-centric observations; the RAG planner retrieves task knowledge and synthesizes a JSON plan; the executor validates and executes actions through the XArm SDK with safety gates.
  • Figure 2: A representative scene where the robot is performing manipulation.
  • Figure 3: Adaptive reasoning flow for "pick up the screwdriver" with failure recovery. The system demonstrates intelligent fallback from horizontal to arc scanning, and when objects remain undetected, it reprompts the RAG system with failure context to generate alternative strategies.