Table of Contents
Fetching ...

Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity

Kaiqu Liang, Zixu Zhang, Jaime Fernández Fisac

TL;DR

This work tackles unsafe, uncertain robot planning under natural language ambiguity by introducing introspective planning, which builds a knowledge base of human-aligned introspective rationales and retrieves them to guide LLM-based planning. It couples this with conformal prediction to provide statistically guaranteed, calibrated prediction sets, aiming to reduce unnecessary user clarifications while maintaining safety and goal alignment. Evaluations across three benchmarks, including a new Safe Mobile Manipulation dataset, show that introspection improves compliance and safety, and that the combination with conformal prediction tightens confidence bounds with strong guarantees. The approach highlights a practical path toward uncertainty-aware, instruction-grounded robotics with measurable safety assurances and reduced human intervention.

Abstract

Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or even unsafe in critical scenarios. Additionally, inherent ambiguity in natural language instructions can introduce uncertainty into the LLM's reasoning and planning processes.We propose introspective planning, a systematic approach that align LLM's uncertainty with the inherent ambiguity of the task. Our approach constructs a knowledge base containing introspective reasoning examples as post-hoc rationalizations of human-selected safe and compliant plans, which are retrieved during deployment. Evaluations on three tasks, including a newly introduced safe mobile manipulation benchmark, demonstrate that introspection substantially improves both compliance and safety over state-of-the-art LLM-based planning methods. Furthermore, we empirically show that introspective planning, in combination with conformal prediction, achieves tighter confidence bounds, maintaining statistical success guarantees while minimizing unnecessary user clarification requests. The webpage and code are accessible at https://introplan.github.io.

Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity

TL;DR

This work tackles unsafe, uncertain robot planning under natural language ambiguity by introducing introspective planning, which builds a knowledge base of human-aligned introspective rationales and retrieves them to guide LLM-based planning. It couples this with conformal prediction to provide statistically guaranteed, calibrated prediction sets, aiming to reduce unnecessary user clarifications while maintaining safety and goal alignment. Evaluations across three benchmarks, including a new Safe Mobile Manipulation dataset, show that introspection improves compliance and safety, and that the combination with conformal prediction tightens confidence bounds with strong guarantees. The approach highlights a practical path toward uncertainty-aware, instruction-grounded robotics with measurable safety assurances and reduced human intervention.

Abstract

Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or even unsafe in critical scenarios. Additionally, inherent ambiguity in natural language instructions can introduce uncertainty into the LLM's reasoning and planning processes.We propose introspective planning, a systematic approach that align LLM's uncertainty with the inherent ambiguity of the task. Our approach constructs a knowledge base containing introspective reasoning examples as post-hoc rationalizations of human-selected safe and compliant plans, which are retrieved during deployment. Evaluations on three tasks, including a newly introduced safe mobile manipulation benchmark, demonstrate that introspection substantially improves both compliance and safety over state-of-the-art LLM-based planning methods. Furthermore, we empirically show that introspective planning, in combination with conformal prediction, achieves tighter confidence bounds, maintaining statistical success guarantees while minimizing unnecessary user clarification requests. The webpage and code are accessible at https://introplan.github.io.
Paper Structure (23 sections, 9 equations, 12 figures, 13 tables, 2 algorithms)

This paper contains 23 sections, 9 equations, 12 figures, 13 tables, 2 algorithms.

Figures (12)

  • Figure 1: Illustration of the introspective planning pipeline. Knowledge base construction: The LLM generates knowledge entries based on human-provided instructions and valid options. Deployment: Upon receiving an instruction, the LLM formulates possible next steps, consults the knowledge base to retrieve the most relevant examples, and uses them as prompts for prediction.
  • Figure 2: Demonstration of using conformal prediction with Introspective Planning. After generating multiple options, we query the LLM for the explanation by introspective planning and then ask the model to predict the most correct option. Based on the likelihood scores of true intents from a calibration dataset, conformal prediction finds the quantile value $\hat{q}$ (0.85), and includes any options scoring above $\geq 1 - \hat{q} = 0.15$ in the prediction set for each test scenario. This method guarantees the correct answer is included among the options, at a confidence level specified by the user.
  • Figure 3: Qualitative results on Safe Mobile Manipulation. We compared our approach with KnowNo knowno2023, both using conformal prediction with an 85% target success rate. Our method generates explanations via introspective planning before applying conformal prediction, whereas KnowNo directly predicts valid options using conformal prediction. We observed that KnowNo over-step in the left case and over-ask in the right case while IntroPlan generates more precise prediction sets.
  • Figure 4: Variation of different performance metrics with respect to the Target Success Rate (TSR). Each subplot compares KnowNo, Retrieval-Q-CoT, and Ours (Conformal) methods across various metrics. Introspective planning (Ours-Conformal) consistenty achieves the best tradeoff between performance metrics and Target Success Rate (TSR) across all comparisons.
  • Figure 5: Variation of different performance metrics with respect to the Target Success Rate on Mobile Manipulation using GPT-3.5. Each subplot compares KnowNo, Retrieval-Q-CoT, and Ours (Conformal) methods across various metrics. Introspective planning (Ours-Conformal) consistently achieves the best tradeoff between metrics and Target Success Rate across all comparisons.
  • ...and 7 more figures