Table of Contents
Fetching ...

REBEL: Rule-based and Experience-enhanced Learning with LLMs for Initial Task Allocation in Multi-Human Multi-Robot Teaming

Arjun Gupte, Ruiqi Wang, Vishnunandan L. N. Venkatesh, Taehyeon Kim, Dezhong Zhao, Ziqin Yuan, Byung-Cheol Min

TL;DR

REBEL proposes a rule-based, experience-enhanced learning framework that augments LLM reasoning for initial task allocation in multi-human multi-robot teams. By combining knowledge acquisition (rule generation and experiential data) with retrieval-augmented inference, it enables efficient, multi-objective alignment and dynamic adaptation without fine-tuning. The approach demonstrates strong performance in single- and multi-objective settings and improves situational awareness compared to baselines, while also offering test-time adaptability for pre-trained RL ITA policies. These results suggest REBEL’s practical potential for deployment-efficient, preference-aware ITA in dynamic MH-MR environments.

Abstract

Multi-human multi-robot teams are increasingly recognized for their efficiency in executing large-scale, complex tasks by integrating heterogeneous yet potentially synergistic humans and robots. However, this inherent heterogeneity presents significant challenges in teaming, necessitating efficient initial task allocation (ITA) strategies that optimally form complementary human-robot pairs or collaborative chains and establish well-matched task distributions. While current learning-based methods demonstrate promising performance, they often incur high computational costs and lack the flexibility to incorporate user preferences in multi-objective optimization (MOO) or adapt to last-minute changes in dynamic real-world environments. To address these limitations, we propose REBEL, an LLM-based ITA framework that integrates rule-based and experience-enhanced learning to enhance LLM reasoning capabilities and improve in-context adaptability to MOO and situational changes. Extensive experiments validate the effectiveness of REBEL in both single-objective and multi-objective scenarios, demonstrating superior alignment with user preferences and enhanced situational awareness to handle unexpected team composition changes. Additionally, we show that REBEL can complement pre-trained ITA policies, further boosting situational adaptability and overall team performance. Website at https://sites.google.com/view/ita-rebel .

REBEL: Rule-based and Experience-enhanced Learning with LLMs for Initial Task Allocation in Multi-Human Multi-Robot Teaming

TL;DR

REBEL proposes a rule-based, experience-enhanced learning framework that augments LLM reasoning for initial task allocation in multi-human multi-robot teams. By combining knowledge acquisition (rule generation and experiential data) with retrieval-augmented inference, it enables efficient, multi-objective alignment and dynamic adaptation without fine-tuning. The approach demonstrates strong performance in single- and multi-objective settings and improves situational awareness compared to baselines, while also offering test-time adaptability for pre-trained RL ITA policies. These results suggest REBEL’s practical potential for deployment-efficient, preference-aware ITA in dynamic MH-MR environments.

Abstract

Multi-human multi-robot teams are increasingly recognized for their efficiency in executing large-scale, complex tasks by integrating heterogeneous yet potentially synergistic humans and robots. However, this inherent heterogeneity presents significant challenges in teaming, necessitating efficient initial task allocation (ITA) strategies that optimally form complementary human-robot pairs or collaborative chains and establish well-matched task distributions. While current learning-based methods demonstrate promising performance, they often incur high computational costs and lack the flexibility to incorporate user preferences in multi-objective optimization (MOO) or adapt to last-minute changes in dynamic real-world environments. To address these limitations, we propose REBEL, an LLM-based ITA framework that integrates rule-based and experience-enhanced learning to enhance LLM reasoning capabilities and improve in-context adaptability to MOO and situational changes. Extensive experiments validate the effectiveness of REBEL in both single-objective and multi-objective scenarios, demonstrating superior alignment with user preferences and enhanced situational awareness to handle unexpected team composition changes. Additionally, we show that REBEL can complement pre-trained ITA policies, further boosting situational adaptability and overall team performance. Website at https://sites.google.com/view/ita-rebel .
Paper Structure (24 sections, 8 equations, 4 figures, 2 tables)

This paper contains 24 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Conceptual illustration of the proposed LLM-based REBEL framework for ITA in MH-MR teams. Given a multi-attribute observation that reflects the heterogeneity of the MH-MR team, assigned tasks, potential user preferences for multi-objective optimization (MOO), and last-minute team composition changes, the LLM generates an adaptive ITA plan. The system also retrieves the most relevant guidance rules and prior experiences from previous interactions to enhance decision-making.
  • Figure 2: Illustration of the three stages in the proposed LLM-based REBEL framework for ITA in MH-MR teams. The first two stages comprise the Knowledge Acquisition phase in which the LLM creates ITA plans for different randomized missions and generates a collection of learned rules and experience data through simulation. During the Inferencing stage, the LLM leverages the Rule and Experience Retrieval modules to extract rules and experiences most relevant to the user's input to enhance the quality of its ITA plan.
  • Figure 3: A visual representation of the simulation environment is depicted. The scale has been adjusted to enhance clarity and visualization. Each POI is distinguished by color to indicate the complexity level for hazard evaluation.
  • Figure 4: Performance of different methods in MOO settings in terms of the normalized performance of each objective. TP denotes task performance, MT represents mission time, and HW indicates human workload. The normalized values of each objective under different user preference types are connected by lines to illustrate the preference alignment levels. Blue lines indicate user preference prioritizing TP (with TP, MT, HW weights of 0.5, 0.25, 0.25), orange lines prioritize MT (with weights of 0.25, 0.5, 0.25), and green lines prioritize HW (with weights of 0.25, 0.25, 0.5). The actual prioritized objectives for each method are enclosed in a dotted red box.