Table of Contents
Fetching ...

LASMP: Language Aided Subset Sampling Based Motion Planner

Saswati Bhattacharjee, Anirban Sinha, Chinwe Ekenna

TL;DR

LASMP addresses the inefficiency of traditional sampling-based planners by grounding natural language commands into low-level navigation cues and guiding a subset-based RRT to solve a sequence of subproblems. It integrates Whisper for speech-to-text, RoBERTa for NER to extract turns and destinations, and a modified RRT that samples from a local rectangular subset defined by heading parameters, using ray-casting to detect feasible intersections. The method achieves substantial improvements in sample efficiency (reducing nodes by about 55% and random queries by about 80%) while maintaining safe, collision-free paths, and it demonstrates both simulation and real-world viability in indoor environments. The framework lays groundwork for practical, language-assisted navigation, with future extensions to dynamic obstacles and larger-scale deployment.

Abstract

This paper presents the Language Aided Subset Sampling Based Motion Planner (LASMP), a system that helps mobile robots plan their movements by using natural language instructions. LASMP uses a modified version of the Rapidly Exploring Random Tree (RRT) method, which is guided by user-provided commands processed through a language model (RoBERTa). The system improves efficiency by focusing on specific areas of the robot's workspace based on these instructions, making it faster and less resource-intensive. Compared to traditional RRT methods, LASMP reduces the number of nodes needed by 55% and cuts random sample queries by 80%, while still generating safe, collision-free paths. Tested in both simulated and real-world environments, LASMP has shown better performance in handling complex indoor scenarios. The results highlight the potential of combining language processing with motion planning to make robot navigation more efficient.

LASMP: Language Aided Subset Sampling Based Motion Planner

TL;DR

LASMP addresses the inefficiency of traditional sampling-based planners by grounding natural language commands into low-level navigation cues and guiding a subset-based RRT to solve a sequence of subproblems. It integrates Whisper for speech-to-text, RoBERTa for NER to extract turns and destinations, and a modified RRT that samples from a local rectangular subset defined by heading parameters, using ray-casting to detect feasible intersections. The method achieves substantial improvements in sample efficiency (reducing nodes by about 55% and random queries by about 80%) while maintaining safe, collision-free paths, and it demonstrates both simulation and real-world viability in indoor environments. The framework lays groundwork for practical, language-assisted navigation, with future extensions to dynamic obstacles and larger-scale deployment.

Abstract

This paper presents the Language Aided Subset Sampling Based Motion Planner (LASMP), a system that helps mobile robots plan their movements by using natural language instructions. LASMP uses a modified version of the Rapidly Exploring Random Tree (RRT) method, which is guided by user-provided commands processed through a language model (RoBERTa). The system improves efficiency by focusing on specific areas of the robot's workspace based on these instructions, making it faster and less resource-intensive. Compared to traditional RRT methods, LASMP reduces the number of nodes needed by 55% and cuts random sample queries by 80%, while still generating safe, collision-free paths. Tested in both simulated and real-world environments, LASMP has shown better performance in handling complex indoor scenarios. The results highlight the potential of combining language processing with motion planning to make robot navigation more efficient.
Paper Structure (16 sections, 6 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 6 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: A robot uses LASMP to receive high-level textual or speech instructions over the cloud and parses that instruction to find a collision-free path by utilizing language grounded RRT planner. (The objects such as sofas, chairs of Figure adopted fromrasouli2017effect.)
  • Figure 2: The RoBERTa model parses the user instruction to identify the navigation ("left", "right", etc.) and destination ("ZONE") entities. If only the "ZONE" entity is identified, it is parsed again through a neural network (blue shaded blocks) to identify the turn list. Alternatively, the turn list is directly extracted from the command and inputted into our proposed subset sampling-based planner as shown by the dashed arrow. LASMP initiates an efficient sampling-based path search by intelligently focusing on a subset of the workspace to draw valid state samples and produce a collision-free path to the goal. An extended ASR workflow,Fig.\ref{['fig:flow_LASMP']}(b), shows the speech to text processing pipeline.
  • Figure 3: 3D occupancy grids of the planning scenarios for evaluating the effectiveness of the LASMP: (left) office space (OS) (middle) random obstacle scene (RO) and (right) domestic environment (DE).
  • Figure 4: Performance of (a)Whisper model for transcribing speech to text (b)the different language models for NER task.
  • Figure 5: Performance of RoBERTa model trained on our dataset in predicting navigation and destination entities from instructions.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2