MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment
Venkata Naren Devarakonda, Ali Umut Kaypak, Shuaihang Yuan, Prashanth Krishnamurthy, Yi Fang, Farshad Khorrami
TL;DR
This work tackles the instability of LLM-driven robotic task planning by introducing MultiTalk, a framework that couples introspective and extrospective dialogue loops with four specialized modules—Perceptor, Planner, Analyzer, and Simulator. Perception grounds plans in the real world using RealSense sensing and Grounded SAM, while the Planner generates executable action sequences and seeks clarifications. The Analyzer critiques plans for logical, syntactic, and grounding errors, and the Simulator verifies physical feasibility in a MuJoCo environment, flagging issues such as collisions and singularities via the Jacobian condition number. Experiments on a 7-DoF robot arm across multiple tasks show that the integrated feedback loops significantly improve plan correctness and feasibility, outperforming baselines that lack structured critique and grounding. The approach demonstrates robust, scalable robotic task planning with a modest API cost, highlighting practical potential for embodied agents in dynamic environments.
Abstract
LLMs have shown promising results in task planning due to their strong natural language understanding and reasoning capabilities. However, issues such as hallucinations, ambiguities in human instructions, environmental constraints, and limitations in the executing agent's capabilities often lead to flawed or incomplete plans. This paper proposes MultiTalk, an LLM-based task planning methodology that addresses these issues through a framework of introspective and extrospective dialogue loops. This approach helps ground generated plans in the context of the environment and the agent's capabilities, while also resolving uncertainties and ambiguities in the given task. These loops are enabled by specialized systems designed to extract and predict task-specific states, and flag mismatches or misalignments among the human user, the LLM agent, and the environment. Effective feedback pathways between these systems and the LLM planner foster meaningful dialogue. The efficacy of this methodology is demonstrated through its application to robotic manipulation tasks. Experiments and ablations highlight the robustness and reliability of our method, and comparisons with baselines further illustrate the superiority of MultiTalk in task planning for embodied agents.
