FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams
Corban Rivera, Grayson Byrd, Meghan Booker, Bethany Kemp, Allison Gaines, Emma Holmes, James Uplinger, Celso M de Melo, David Handelman
TL;DR
FLEET addresses the challenge of coordinating heterogeneous robot teams guided by natural language by integrating an LLM front-end that outputs a task graph with durations and a capability-aware fitness matrix with a formal MILP (and fallback Auction) scheduler to minimize makespan under precedence and resource constraints. It enables closed-loop execution where robots stream status to a world model and trigger replanning when deviations occur, maintaining interpretable artifacts such as the task graph, fitness matrix, and schedule. Across PARTNR simulations and hardware trials with two Spots, FLEET outperforms purely generative planners, particularly on heterogeneous tasks, and ablations demonstrate that MILP and fitness contributions are complementary. The results highlight a practical path toward reliable, linguistically guided multi-robot coordination in open-world environments with safety and efficiency benefits in real hardware.
Abstract
Coordinating heterogeneous robot teams from free-form natural-language instructions is hard. Language-only planners struggle with long-horizon coordination and hallucination, while purely formal methods require closed-world models. We present FLEET, a hybrid decentralized framework that turns language into optimized multi-robot schedules. An LLM front-end produces (i) a task graph with durations and precedence and (ii) a capability-aware robot--task fitness matrix; a formal back-end solves a makespan-minimization problem while the underlying robots execute their free-form subtasks with agentic closed-loop control. Across multiple free-form language-guided autonomy coordination benchmarks, FLEET improves success over state of the art generative planners on two-agent teams across heterogeneous tasks. Ablations show that mixed integer linear programming (MILP) primarily improves temporal structure, while LLM-derived fitness is decisive for capability-coupled tasks; together they deliver the highest overall performance. We demonstrate the translation to real world challenges with hardware trials using a pair of quadruped robots with disjoint capabilities.
