RePLan: Robotic Replanning with Perception and Language Models
Marta Skreta, Zihan Zhou, Jia Lin Yuan, Kourosh Darvish, Alán Aspuru-Guzik, Animesh Garg
TL;DR
RePLan presents a hierarchical, perception-grounded framework for online replanning in long-horizon robotic tasks by integrating a high-level LLM planner, a Vision-Language Model perceiver, a low-level reward translator, a MuJoCo-based motion controller, and an LLM/VLM verifier. It introduces the RC Benchmark to evaluate open-ended, multi-step planning with perception feedback and replanning. Empirical results show RePLan achieving roughly four times better success than a language-to-reward baseline and demonstrating real-robot applicability, with a notable dependence on perceptual grounding. The work highlights the critical role of a multi-stage verifier in improving robustness and discusses limitations related to VLM spatial reasoning and perception reliability, pointing to future improvements in vision-grounded reasoning for robotics.
Abstract
Advancements in large language models (LLMs) have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level robot actions, effectively bridging the interface between high-level planning and low-level robot control. However, the challenge remains that even with syntactically correct plans, robots can still fail to achieve their intended goals due to imperfect plans or unexpected environmental issues. To overcome this, Vision Language Models (VLMs) have shown remarkable success in tasks such as visual question answering. Leveraging the capabilities of VLMs, we present a novel framework called Robotic Replanning with Perception and Language Models (RePLan) that enables online replanning capabilities for long-horizon tasks. This framework utilizes the physical grounding provided by a VLM's understanding of the world's state to adapt robot actions when the initial plan fails to achieve the desired goal. We developed a Reasoning and Control (RC) benchmark with eight long-horizon tasks to test our approach. We find that RePLan enables a robot to successfully adapt to unforeseen obstacles while accomplishing open-ended, long-horizon goals, where baseline models cannot, and can be readily applied to real robots. Find more information at https://replan-lm.github.io/replan.github.io/
