Table of Contents
Fetching ...

RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset

Yongzhong Wang, Keyu Zhu, Yong Zhong, Liqiong Wang, Jinyu Yang, Feng Zheng

Abstract

The acquisition of large-scale physical interaction data, a critical prerequisite for modern robot learning, is severely bottlenecked by the prohibitive cost and scalability limits of human-in-the-loop collection paradigms. To break this barrier, we introduce Robust Autonomous Data Acquisition for Robotics (RADAR), a fully autonomous, closed-loop data generation engine that completely removes human intervention from the collection cycle. RADAR elegantly divides the cognitive load into a four-module pipeline. Anchored by 2-5 3D human demonstrations as geometric priors, a Vision-Language Model first orchestrates scene-relevant task generation via precise semantic object grounding and skill retrieval. Next, a Graph Neural Network policy translates these subtasks into physical actions via in-context imitation learning. Following execution, the VLM performs automated success evaluation using a structured Visual Question Answering pipeline. Finally, to shatter the bottleneck of manual resets, a Finite State Machine orchestrates an autonomous environment reset and asymmetric data routing mechanism. Driven by simultaneous forward-reverse planning with a strict Last-In, First-Out causal sequence, the system seamlessly restores unstructured workspaces and robustly recovers from execution failures. This continuous brain-cerebellum synergy transforms data collection into a self-sustaining process. Extensive evaluations highlight RADAR's exceptional versatility. In simulation, our framework achieves up to 90% success rates on complex, long-horizon tasks, effortlessly solving challenges where traditional baselines plummet to near-zero performance. In real-world deployments, the system reliably executes diverse, contact-rich skills (e.g., deformable object manipulation) via few-shot adaptation without domain-specific fine-tuning, providing a highly scalable paradigm for robotic data acquisition.

RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset

Abstract

The acquisition of large-scale physical interaction data, a critical prerequisite for modern robot learning, is severely bottlenecked by the prohibitive cost and scalability limits of human-in-the-loop collection paradigms. To break this barrier, we introduce Robust Autonomous Data Acquisition for Robotics (RADAR), a fully autonomous, closed-loop data generation engine that completely removes human intervention from the collection cycle. RADAR elegantly divides the cognitive load into a four-module pipeline. Anchored by 2-5 3D human demonstrations as geometric priors, a Vision-Language Model first orchestrates scene-relevant task generation via precise semantic object grounding and skill retrieval. Next, a Graph Neural Network policy translates these subtasks into physical actions via in-context imitation learning. Following execution, the VLM performs automated success evaluation using a structured Visual Question Answering pipeline. Finally, to shatter the bottleneck of manual resets, a Finite State Machine orchestrates an autonomous environment reset and asymmetric data routing mechanism. Driven by simultaneous forward-reverse planning with a strict Last-In, First-Out causal sequence, the system seamlessly restores unstructured workspaces and robustly recovers from execution failures. This continuous brain-cerebellum synergy transforms data collection into a self-sustaining process. Extensive evaluations highlight RADAR's exceptional versatility. In simulation, our framework achieves up to 90% success rates on complex, long-horizon tasks, effortlessly solving challenges where traditional baselines plummet to near-zero performance. In real-world deployments, the system reliably executes diverse, contact-rich skills (e.g., deformable object manipulation) via few-shot adaptation without domain-specific fine-tuning, providing a highly scalable paradigm for robotic data acquisition.
Paper Structure (25 sections, 7 equations, 4 figures, 2 tables)

This paper contains 25 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the RADAR pipeline and the state transition diagram of its decoupled Finite State Machine (FSM). To ensure logical clarity, the architecture strictly separates physical execution loops (States A, B, C) from concurrent data routing actions (States D, E). A fully successful execution forms a continuous loop (B $\to$ C $\to$ B), concurrently triggering Dual Storage (D) to repeatedly harvest trajectory variations without re-planning. In contrast, an asymmetric recovery loop (B $\to$ C $\to$ A) bypasses reset failures by selectively saving the valid forward trajectory via Single Storage (E) and initiating a novel planning cycle on the altered workspace. This architecture guarantees a truly self-sustaining, human-out-of-the-loop engine.
  • Figure 2: Overview of our hierarchical Scene-Relevant Task Generation framework across varying complexities. (a) Atomic Task in Simple Environments: The VLM performs direct affordance matching, mapping a deformable object task (folding a towel) to a geometrically congruent 3D prior (closing a box). (b) Atomic Task in Complex Environments: Through selective attention, the planner actively masks out distractors (e.g., strawberry, Rubik's cube) to precisely ground the target object (lemon) and retrieve a robust prior. (c) Long-Horizon Skill Chaining: For multi-step tasks, the VLM orchestrates a forward skill chain (pushing and stacking blocks) while concurrently generating a strict Last-In, First-Out (LIFO) causal sequence to autonomously construct executable environment-resetting plans.
  • Figure 3: Visualizations of long-horizon tasks in the RLBench simulation. (a) Put Laptop & Cup into Tray: The robot successfully executes a multi-step sequence requiring interactions with multiple distinct objects. (b) Close then Open Box: The pipeline reliably performs state-dependent articulated actions, demonstrating its robust skill chaining capability.
  • Figure 4: Qualitative results of our automated data collection pipeline across different scenarios. (a) Simple Atomic Scenario: The robot executes deformable object manipulation (folding a towel) in a real-world setting without distractors. (b) Complex Atomic Scenario: The pipeline utilizes selective attention to isolate a target object (a strawberry) among visual distractors for precise grasping. (c) Long-Horizon Scenario: In simulation, the VLM decomposes a complex instruction into a sequential skill chain (e.g., first pushing, then stacking a block).