Table of Contents
Fetching ...

Shared Control with Black Box Agents using Oracle Queries

Inbal Avraham, Reuth Mirsky

TL;DR

This work extends shared control by introducing an oracle-driven query channel between a cooperating agent and a learning control, formalized as a MA-MDP with two state spaces and a configurable operation protocol. It proposes three querying heuristics—Entropy, Utility, and Reinforcement Learning—to decide when to consult the oracle, aiming to reduce learning cost while maintaining or improving policy performance. Empirical evaluation across automata-based tasks and a Lunar Lander domain demonstrates that querying can substantially decrease the number of queries and accelerate learning, with trade-offs in accuracy and reliance on oracle type. The findings suggest practical benefits for faster, more reliable shared control, while highlighting the importance of oracle quality and the potential of adaptive querying strategies for real-world deployment.

Abstract

Shared control problems involve a robot learning to collaborate with a human. When learning a shared control policy, short communication between the agents can often significantly reduce running times and improve the system's accuracy. We extend the shared control problem to include the ability to directly query a cooperating agent. We consider two types of potential responses to a query, namely oracles: one that can provide the learner with the best action they should take, even when that action might be myopically wrong, and one with a bounded knowledge limited to its part of the system. Given this additional information channel, this work further presents three heuristics for choosing when to query: reinforcement learning-based, utility-based, and entropy-based. These heuristics aim to reduce a system's overall learning cost. Empirical results on two environments show the benefits of querying to learn a better control policy and the tradeoffs between the proposed heuristics.

Shared Control with Black Box Agents using Oracle Queries

TL;DR

This work extends shared control by introducing an oracle-driven query channel between a cooperating agent and a learning control, formalized as a MA-MDP with two state spaces and a configurable operation protocol. It proposes three querying heuristics—Entropy, Utility, and Reinforcement Learning—to decide when to consult the oracle, aiming to reduce learning cost while maintaining or improving policy performance. Empirical evaluation across automata-based tasks and a Lunar Lander domain demonstrates that querying can substantially decrease the number of queries and accelerate learning, with trade-offs in accuracy and reliance on oracle type. The findings suggest practical benefits for faster, more reliable shared control, while highlighting the importance of oracle quality and the potential of adaptive querying strategies for real-world deployment.

Abstract

Shared control problems involve a robot learning to collaborate with a human. When learning a shared control policy, short communication between the agents can often significantly reduce running times and improve the system's accuracy. We extend the shared control problem to include the ability to directly query a cooperating agent. We consider two types of potential responses to a query, namely oracles: one that can provide the learner with the best action they should take, even when that action might be myopically wrong, and one with a bounded knowledge limited to its part of the system. Given this additional information channel, this work further presents three heuristics for choosing when to query: reinforcement learning-based, utility-based, and entropy-based. These heuristics aim to reduce a system's overall learning cost. Empirical results on two environments show the benefits of querying to learn a better control policy and the tradeoffs between the proposed heuristics.

Paper Structure

This paper contains 15 sections, 14 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The new shared system with queries framework.
  • Figure 2: The use cases from the Automata domain and the Lunar Lander domain, used for the evaluation of the shared control synthesis with queries. For each Automata use case, the states of the control automaton are labeled $c_i$ and the states of the environment automaton are labeled $e_i$.
  • Figure 3: Number of Queries used per training episode of the Combination Lock use case. Red lines represent tests with the expert oracle, while blue lines represent tests with the teacher oracle. The results are cut after 15 episodes, as the algorithm's behavior remains the same.

Theorems & Definitions (3)

  • Definition 1
  • Example 1
  • Definition 2