Table of Contents
Fetching ...

VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson, Nikhil J Joshi, Daniel Lam, Tsang-Wei Edward Lee, Alex Luong, Sharath Maddineni, Harsh Patel, Jodilyn Peralta, Jornell Quiambao, Diego Reyes, Rosario M Jauregui Ruano, Dorsa Sadigh, Pannag Sanketi, Leila Takayama, Pavel Vodenski, Fei Xia

TL;DR

VADER addresses long-horizon robotic tasks by grounding execution with visual affordance and error detection via VQA and coupling this with a language model planner and a cloud-based HRFS to solicit help from other robots or humans. It implements a plan-execute-detect-recover loop that can replan mid-execution and seek assistance when necessary, enabling collaboration across heterogeneous agents. The authors demonstrate feasibility through a two-robot pilot study and a human-robot interaction study (N=19) in an office-kitchen setting, showing improved collaboration and perceived helpfulness at the cost of longer task durations due to cloud-based planning. This work advances practical distributed autonomy by enabling environment-grounded recovery and multi-agent assistance, with future work focusing on reducing latency and refining help-selection strategies.

Abstract

Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/

VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

TL;DR

VADER addresses long-horizon robotic tasks by grounding execution with visual affordance and error detection via VQA and coupling this with a language model planner and a cloud-based HRFS to solicit help from other robots or humans. It implements a plan-execute-detect-recover loop that can replan mid-execution and seek assistance when necessary, enabling collaboration across heterogeneous agents. The authors demonstrate feasibility through a two-robot pilot study and a human-robot interaction study (N=19) in an office-kitchen setting, showing improved collaboration and perceived helpfulness at the cost of longer task durations due to cloud-based planning. This work advances practical distributed autonomy by enabling environment-grounded recovery and multi-agent assistance, with future work focusing on reducing latency and refining help-selection strategies.

Abstract

Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/
Paper Structure (14 sections, 4 figures, 1 algorithm)

This paper contains 14 sections, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: VADER: Visual Affordance Detection and Error Recovery. A plan, execution, detect framework with a seeking help skill as a recovery mechanism. While executing a plan, the robot detects a deviation from its expectations -- a coke can obstructing the table area to be wiped. It replans for recovery, and upon receiving help, completes the original plan.
  • Figure 2: LMP Replanning with Request for help. The process is imagined as a conversation between different components. The immediate next skill from original LMP plan at step $n$ is passed through outcome description function $\mathcal{O}$ and the $\mathcal{V}_{\text{QA}}$ for execution status assessment and folded back to LMP. A recovery plan is laid out.
  • Figure 3: Experimental Setup.Left The experimental setup with robot trajectory overlaid onto the matterport scanned version of the space used for experiments. Middle. (top) table wiping expert with the wiping tool, (bottom) manipulation expert with gripper. Right. A snapshot from one of experiments from our user study discussed in sec. \ref{['s:user-studies']}.
  • Figure 4: Means and standard errors for participant responses to HRI questionnaires. VADER outperformed control significantly on the statement where participants were asked if they could help the robot successfully. No significant difference was observed between the conditions on other statements.