SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins
Aladin Djuhera, Amin Seffo, Vlad C. Andrei, Holger Boche, Walid Saad
TL;DR
SCoTT tackles the problem of planning robot trajectories under wireless performance constraints by introducing a Strategic Chain-of-Thought Tasking framework that uses multi-modal vision-language models (VLMs) and retrieval-augmented generation to process wireless heatmaps and path gains from a digital twin. It decomposes the planning task into strategy-guided subtasks, enabling grounded reasoning and preventing hallucinations, and can seed a cost-optimal dynamic-programming WA* solver to accelerate search. Empirically, SCoTT achieves path gains within 2% of the optimal DP-WA* while producing shorter trajectories, and can reduce DP-WA* execution time by up to 62% when used as a seed. The approach is validated in ROS/Gazebo simulations, demonstrates compatibility with both large and compact VLMs for on-device deployment, and discusses practical data pipelines and deployment considerations for 6G-enabled digital twins.
Abstract
Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incurs prohibitive search costs. In this paper, we propose SCoTT, a wireless-aware path planning framework that leverages vision-language models (VLMs) to co-optimize average path gains and trajectory length using wireless heatmap images and ray-tracing data from a digital twin (DT). At the core of our framework is Strategic Chain-of-Thought Tasking (SCoTT), a novel prompting paradigm that decomposes the exhaustive search problem into structured subtasks, each solved via chain-of-thought prompting. To establish strong baselines, we compare classical A* and wireless-aware extensions of it, and derive DP-WA*, an optimal, iterative dynamic programming algorithm that incorporates all path gains and distance metrics from the DT, but at significant computational cost. In extensive experiments, we show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories. Moreover, SCoTT's intermediate outputs can be used to accelerate DP-WA* by reducing its search space, saving up to 62% in execution time. We validate our framework using four VLMs, demonstrating effectiveness across both large and small models, thus making it applicable to a wide range of compact models at low inference cost. We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations. Finally, we discuss data acquisition pipelines, compute requirements, and deployment considerations for VLMs in 6G-enabled DTs, underscoring the potential of natural language interfaces for wireless-aware navigation in real-world applications.
