Table of Contents
Fetching ...

ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge

Mikolaj Walczak, Uttej Kallakuri, Tinoosh Mohsenin

TL;DR

ATLASv2 addresses the challenge of enabling complex, multi-task autonomous navigation and manipulation entirely on resource-constrained edge hardware. It fuses a compact fine-tuned TinyLLaMA LLM with real-time object detection (YOLOv5n with TensorRT) and a ROS-based path planner to operate on the Jetson Nano, while dynamically expanding a knowledge base of landmarks from detected objects. Key contributions include a fully onboard architecture, dynamic KB-building through environmental perception, and a low-latency, resource-conscious scheduling strategy demonstrated in real-world home and office-like settings, with onboard-LMM performance benchmarked against a cloud-LMM baseline. The results show that the onboard system can decompose high-level natural language tasks into low-level actions and execute them with competitive fidelity, while preserving privacy and independence from network access, albeit with higher latency and memory pressure than cloud-based solutions. This work advances practical edge-enabled embodied AI by bridging simulation-to-real-world deployment for hierarchical navigation and manipulation.

Abstract

Autonomous systems deployed on edge devices face significant challenges, including resource constraints, real-time processing demands, and adapting to dynamic environments. This work introduces ATLASv2, a novel system that integrates a fine-tuned TinyLLM, real-time object detection, and efficient path planning to enable hierarchical, multi-task navigation and manipulation all on the edge device, Jetson Nano. ATLASv2 dynamically expands its navigable landmarks by detecting and localizing objects in the environment which are saved to its internal knowledge base to be used for future task execution. We evaluate ATLASv2 in real-world environments, including a handcrafted home and office setting constructed with diverse objects and landmarks. Results show that ATLASv2 effectively interprets natural language instructions, decomposes them into low-level actions, and executes tasks with high success rates. By leveraging generative AI in a fully on-board framework, ATLASv2 achieves optimized resource utilization with minimal prompting latency and power consumption, bridging the gap between simulated environments and real-world applications.

ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge

TL;DR

ATLASv2 addresses the challenge of enabling complex, multi-task autonomous navigation and manipulation entirely on resource-constrained edge hardware. It fuses a compact fine-tuned TinyLLaMA LLM with real-time object detection (YOLOv5n with TensorRT) and a ROS-based path planner to operate on the Jetson Nano, while dynamically expanding a knowledge base of landmarks from detected objects. Key contributions include a fully onboard architecture, dynamic KB-building through environmental perception, and a low-latency, resource-conscious scheduling strategy demonstrated in real-world home and office-like settings, with onboard-LMM performance benchmarked against a cloud-LMM baseline. The results show that the onboard system can decompose high-level natural language tasks into low-level actions and execute them with competitive fidelity, while preserving privacy and independence from network access, albeit with higher latency and memory pressure than cloud-based solutions. This work advances practical edge-enabled embodied AI by bridging simulation-to-real-world deployment for hierarchical navigation and manipulation.

Abstract

Autonomous systems deployed on edge devices face significant challenges, including resource constraints, real-time processing demands, and adapting to dynamic environments. This work introduces ATLASv2, a novel system that integrates a fine-tuned TinyLLM, real-time object detection, and efficient path planning to enable hierarchical, multi-task navigation and manipulation all on the edge device, Jetson Nano. ATLASv2 dynamically expands its navigable landmarks by detecting and localizing objects in the environment which are saved to its internal knowledge base to be used for future task execution. We evaluate ATLASv2 in real-world environments, including a handcrafted home and office setting constructed with diverse objects and landmarks. Results show that ATLASv2 effectively interprets natural language instructions, decomposes them into low-level actions, and executes tasks with high success rates. By leveraging generative AI in a fully on-board framework, ATLASv2 achieves optimized resource utilization with minimal prompting latency and power consumption, bridging the gap between simulated environments and real-world applications.

Paper Structure

This paper contains 9 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Block diagram showing how ATLASv2 is used given two sample prompts provided by the user. The system begins with a prompt received from the user (Task 1) for which the knowledge base is appended to provide the available landmarks. The LLM then generates a plan (if available) that is executed in sequence by the navigation and manipulation package. During execution if a new object of interest is detected (eg. laptop and teddy bear), the object and its location, visualized using RVIZ 10.1007/s11235-015-0034-5 , is appended to the expanding knowledge base. The user can then use the newly detected objects for future tasks (Task 2). The full system is deployed into the Jetson Nano edge device and receives prompts remotely from the user.
  • Figure 2: The robotic agent used for deployment of ATLASv2 (a) Yahboom Transbot which contains a 2D lidar for mapping, an Orbbec Astra Pro camera for depth and RGB input, a 3-Axis robotic arm for manipulating objects, crawler tracks for moving around the environment, and a Jetson Nano. Additionally, the configurations of the real-world environment used for system evaluation including (b) The home emulated environment, which includes a living room, kitchen, and kids room, each populated with objects typical of a residential setting and (c) the office emulated environment, comprising of a lounge, office, lobby, and meeting room, with items representative of a professional workplace.
  • Figure 3: Samples from the office setting experiment showing the agent successfully completing high-level tasks provided to the onboard LLM. The Figure displays two of the prompts executed by the agent: "Go to the meeting room" (A, B, and C) and "I'm feeling lonely, bring the teddy bear to the office" (C, D, E, and F). For the first prompt the agent successfully navigates to the meeting room (C) and adds the "teddy bear" to the knowledge base along its path (B). Then, in the second prompt the agent successfully performs actions representing grabbing the teddy bear (E) and moving it to the office (F).
  • Figure 4: Total power, RAM utilization, swap utilization, and latency results for navigation and manipulation based tasks on the Jetson Nano, run in 10W mode using a 1.43 ghz CPU clock frequency, 921 mhz GPU clock frequency, 4 GB of RAM and 8 GB of swap, during the real-world office setting experiment for the cloud and onboard implementations. The Figure shows results over time for (a) power consumption when deploying the system using the cloud LLM Cohere, (b) memory utilization for the system using the cloud LLM Cohere, (c) power for the fully onboard system (d) memory utilization for the fully onboard system and (e) bar plots for peak power and memory utilization along with prompt processing latency results for simple navigation prompts and more complex manipulation prompts.