Table of Contents
Fetching ...

Adaptive Bi-Level Multi-Robot Task Allocation and Learning under Uncertainty with Temporal Logic Constraints

Xiaoshan Lin, Roberto Tron

TL;DR

This work addresses the problem of multi-robot coordination under unknown robot transition models, ensuring that tasks specified by Time Window Temporal Logic are satisfied with user-defined probability thresholds by presenting a bi-level framework that integrates high-level task allocation and low-level distributed policy learning and execution.

Abstract

This work addresses the problem of multi-robot coordination under unknown robot transition models, ensuring that tasks specified by Time Window Temporal Logic are satisfied with user-defined probability thresholds. We present a bi-level framework that integrates (i) high-level task allocation, where tasks are assigned based on the robots' estimated task completion probabilities and expected rewards, and (ii) low-level distributed policy learning and execution, where robots independently optimize auxiliary rewards while fulfilling their assigned tasks. To handle uncertainty in robot dynamics, our approach leverages real-time task execution data to iteratively refine expected task completion probabilities and rewards, enabling adaptive task allocation without explicit robot transition models. We theoretically validate the proposed algorithm, demonstrating that the task assignments meet the desired probability thresholds with high confidence. Finally, we demonstrate the effectiveness of our framework through comprehensive simulations.

Adaptive Bi-Level Multi-Robot Task Allocation and Learning under Uncertainty with Temporal Logic Constraints

TL;DR

This work addresses the problem of multi-robot coordination under unknown robot transition models, ensuring that tasks specified by Time Window Temporal Logic are satisfied with user-defined probability thresholds by presenting a bi-level framework that integrates high-level task allocation and low-level distributed policy learning and execution.

Abstract

This work addresses the problem of multi-robot coordination under unknown robot transition models, ensuring that tasks specified by Time Window Temporal Logic are satisfied with user-defined probability thresholds. We present a bi-level framework that integrates (i) high-level task allocation, where tasks are assigned based on the robots' estimated task completion probabilities and expected rewards, and (ii) low-level distributed policy learning and execution, where robots independently optimize auxiliary rewards while fulfilling their assigned tasks. To handle uncertainty in robot dynamics, our approach leverages real-time task execution data to iteratively refine expected task completion probabilities and rewards, enabling adaptive task allocation without explicit robot transition models. We theoretically validate the proposed algorithm, demonstrating that the task assignments meet the desired probability thresholds with high confidence. Finally, we demonstrate the effectiveness of our framework through comprehensive simulations.

Paper Structure

This paper contains 20 sections, 2 theorems, 9 equations, 7 figures, 3 tables, 3 algorithms.

Key Result

proposition 1

Let $\lfloor{P}^{\epsilon}_{i,k}\rfloor$ be an arbitrary lower bound for ${P}^{\epsilon}_{i,k}$, that is, $0 \leq \lfloor{P}^{\epsilon}_{i,k}\rfloor\leq {P}^{\epsilon}_{i,k} \leq 1$. If a set of probabilities $\{P_{i,k}\}$ satisfies then $\{P_{i,k}\}$ also satisfies constraint eqn:constraint1.

Figures (7)

  • Figure 1: Motivating example of our proposed framework. Primary tasks: repeatedly transporting resources from warehouses to processing stations and then to operation site within specified time windows, which must be satisfied with a high probability to prevent resource accumulation. Auxiliary tasks : monitoring traffic congestion to improve future routing and delivery times, or returning to stations after completing the primary tasks.
  • Figure 2: (a) Normal DFA for TWTL formula $\phi=[H^1 P]^{[1,2]} \cdot [H^0 D]^{[0,2]}$. (b) DFA for the temporally relaxed TWTL formula $\phi(\boldsymbol{\tau})=[H^1 P]^{[1,2+\tau_1]} \cdot [H^0 D]^{[0,2+\tau_2]}$. (c) Example of a labeled MDP, where $S = \{s_0, s_1\}$, $A = \{a_0, a_1\}$, $AP = \{P, D\}$, $l(s_0) = \{P\}$ and $l(s_1) = \{D\}$.
  • Figure 3: (a) Transitions (intended - black, unintended - red) under each action. (b) An environment with different types of locations denoted by different colors.
  • Figure 4: TWTL Task satisfaction rate. (a): Satisfaction rate over 2000 episodes using static and adaptive lower bounds. (b) Satisfaction rate using adaptive lower bounds, over the first 100 episodes vs 2000 episodes. The dashed line represents the desired probability of each TWTL task. The bars show the mean values over the 20 iterations, with error bars depicting the standard deviation.
  • Figure 5: (a): Total rewards accumulated by all robots over the episodes. The reward represents the sum of individual rewards collected by each robot. (b): Computation time for solving task assignment (Alg. \ref{['alg:bi-level']}, line 4-6) with different number of robots.
  • ...and 2 more figures

Theorems & Definitions (12)

  • definition 1
  • definition 2
  • definition 3: Policy
  • Remark 1
  • proposition 1
  • definition 4
  • definition 5: $\epsilon$-Stochastic Transitions
  • definition 6: Distance-To-$F^{\otimes}$
  • definition 7: $\pi^{\epsilon}$ Policy
  • Remark 2
  • ...and 2 more