Aim My Robot: Precision Local Navigation to Any Object
Xiangyun Meng, Xuning Yang, Sanghun Jung, Fabio Ramos, Srid Sadhan Jujjavarapu, Sanjoy Paul, Dieter Fox
TL;DR
The paper addresses high-precision object-centric navigation without maps or CAD models by introducing AMR, a vision-based local navigation system that uses RGB-D and LiDAR inputs along with a reference image and mask to achieve centimeter-level precision in reaching objects. AMR is trained on a large-scale photorealistic simulation pipeline and employs a transformer-based architecture with three stages: multi-modal sensor encoding, goal- and robot-aware fusion, and autoregressive motion generation to produce precise base trajectories and camera tilt commands. The approach demonstrates strong sim2real transfer, generalizes to unseen objects and different kinematics, and enables downstream tasks like docking and manipulation with minimal fine-tuning. Overall, AMR provides a practical, map-free solution for precise object-centric navigation that can be integrated with higher-level planners and robotic systems for real-world precision tasks.
Abstract
Existing navigation systems mostly consider "success" when the robot reaches within 1m radius to a goal. This precision is insufficient for emerging applications where the robot needs to be positioned precisely relative to an object for downstream tasks, such as docking, inspection, and manipulation. To this end, we design and implement Aim-My-Robot (AMR), a local navigation system that enables a robot to reach any object in its vicinity at the desired relative pose, with centimeter-level precision. AMR achieves high precision and robustness by leveraging multi-modal perception, precise action prediction, and is trained on large-scale photorealistic data generated in simulation. AMR shows strong sim2real transfer and can adapt to different robot kinematics and unseen objects with little to no fine-tuning.
