Table of Contents
Fetching ...

ARMADA: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation

Wenye Yu, Jun Lv, Zixi Ying, Yang Jin, Chuan Wen, Cewu Lu

TL;DR

ARMADA tackles real-world deployment of imitation policies by combining FLOAT, an online failure detector based on Optimal Transport, with a scalable multi-robot shared control loop. FLOAT computes a policy-embedding trajectory match and outputs a failure index, enabling a universal threshold and adaptive rewinding to maintain informative demonstrations, significantly reducing the need for constant human oversight. Across four real-world tasks, FLOAT achieves near 95% accuracy and ARMADA yields over a 4x gain in success rate and more than a 2x reduction in human intervention compared with prior human-in-the-loop methods. The approach supports scalable deployment and rapid adaptation to unseen scenarios, with the methodology and code to be open-sourced.

Abstract

Imitation learning has shown promise in learning from large-scale real-world datasets. However, pretrained policies usually perform poorly without sufficient in-domain data. Besides, human-collected demonstrations entail substantial labour and tend to encompass mixed-quality data and redundant information. As a workaround, human-in-the-loop systems gather domain-specific data for policy post-training, and exploit closed-loop policy feedback to offer informative guidance, but usually require full-time human surveillance during policy rollout. In this work, we devise ARMADA, a multi-robot deployment and adaptation system with human-in-the-loop shared control, featuring an autonomous online failure detection method named FLOAT. Thanks to FLOAT, ARMADA enables paralleled policy rollout and requests human intervention only when necessary, significantly reducing reliance on human supervision. Hence, ARMADA enables efficient acquisition of in-domain data, and leads to more scalable deployment and faster adaptation to new scenarios. We evaluate the performance of ARMADA on four real-world tasks. FLOAT achieves nearly 95% accuracy on average, surpassing prior state-of-the-art failure detection approaches by over 20%. Besides, ARMADA manifests more than 4$\times$ increase in success rate and greater than 2$\times$ reduction in human intervention rate over multiple rounds of policy rollout and post-training, compared to previous human-in-the-loop learning methods.

ARMADA: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation

TL;DR

ARMADA tackles real-world deployment of imitation policies by combining FLOAT, an online failure detector based on Optimal Transport, with a scalable multi-robot shared control loop. FLOAT computes a policy-embedding trajectory match and outputs a failure index, enabling a universal threshold and adaptive rewinding to maintain informative demonstrations, significantly reducing the need for constant human oversight. Across four real-world tasks, FLOAT achieves near 95% accuracy and ARMADA yields over a 4x gain in success rate and more than a 2x reduction in human intervention compared with prior human-in-the-loop methods. The approach supports scalable deployment and rapid adaptation to unseen scenarios, with the methodology and code to be open-sourced.

Abstract

Imitation learning has shown promise in learning from large-scale real-world datasets. However, pretrained policies usually perform poorly without sufficient in-domain data. Besides, human-collected demonstrations entail substantial labour and tend to encompass mixed-quality data and redundant information. As a workaround, human-in-the-loop systems gather domain-specific data for policy post-training, and exploit closed-loop policy feedback to offer informative guidance, but usually require full-time human surveillance during policy rollout. In this work, we devise ARMADA, a multi-robot deployment and adaptation system with human-in-the-loop shared control, featuring an autonomous online failure detection method named FLOAT. Thanks to FLOAT, ARMADA enables paralleled policy rollout and requests human intervention only when necessary, significantly reducing reliance on human supervision. Hence, ARMADA enables efficient acquisition of in-domain data, and leads to more scalable deployment and faster adaptation to new scenarios. We evaluate the performance of ARMADA on four real-world tasks. FLOAT achieves nearly 95% accuracy on average, surpassing prior state-of-the-art failure detection approaches by over 20%. Besides, ARMADA manifests more than 4 increase in success rate and greater than 2 reduction in human intervention rate over multiple rounds of policy rollout and post-training, compared to previous human-in-the-loop learning methods.

Paper Structure

This paper contains 21 sections, 7 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of ARMADA. ARMADA makes use of FLOAT failure detector and enables paralleled policy rollout on multiple robots, only requesting human intervention when necessary. The deployment data collected online are then utilized for policy improvement, forming a scalable deployment and adaptation loop.
  • Figure 2: Method overview. FLOAT failure detector conducts real-time OT matching between the policy embeddings of the current rollout and all expert demonstrations, and defines the minimum OT cost as FLOAT index. We thereby calibrate the FLOAT threshold on all successful rollouts. When the FLOAT index of a rollout trajectory exceeds the threshold, we consider it a failure and employ adaptive rewinding based on OT computation, which helps retrace a previous timestep before the scene was disturbed. Our multi-robot system then allocates an idle human operator to the failed robot for intervention.
  • Figure 3: Failure detection experiment results. FLOAT achieves nearly 95% accuracy across four tasks, which is an improvement of over 20% compared to state-of-the-art baseline methods. It manifests comparable performance to its variant which further integrates action inconsistency metric, showcasing the effectiveness of FLOAT in detecting various failures.
  • Figure 4: Real-world task setup.
  • Figure 5: Success rate over three evaluation rounds. ARMADA exhibits stable progress in success rate, with a more than four-fold increase compared to previous human-in-the-loop learning approach, thanks to our adaptive rewinding mechanism.
  • ...and 4 more figures