Table of Contents
Fetching ...

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

Christine Lee, David Porfirio, Xinyu Jessica Wang, Kevin Zhao, Bilge Mutlu

TL;DR

VeriPlan tackles the challenge of deploying LLM-driven end-user planning by introducing a formal verification layer that uses model checking to enforce user-defined constraints on LLM outputs. The system couples a rule translator, flexibility sliders, and an external model checker with an iterative LLM planning loop to provide deterministic boundaries and transparent feedback, while preserving user control and creativity. A within-subject user study (n=12) shows that VeriPlan improves perceived output quality, usefulness, satisfaction, and efficiency, with the model checker enabling clearer guidance and faster plan convergence. The work offers practical design implications for integrating verification and multi-dimensional user control into LLM systems, and demonstrates how external verification can substantially increase reliability and user trust in everyday planning tasks.

Abstract

Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features -- a rule translator, flexibility sliders, and a model checker -- that engage users in the verification process. Through a user study (n=12), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

TL;DR

VeriPlan tackles the challenge of deploying LLM-driven end-user planning by introducing a formal verification layer that uses model checking to enforce user-defined constraints on LLM outputs. The system couples a rule translator, flexibility sliders, and an external model checker with an iterative LLM planning loop to provide deterministic boundaries and transparent feedback, while preserving user control and creativity. A within-subject user study (n=12) shows that VeriPlan improves perceived output quality, usefulness, satisfaction, and efficiency, with the model checker enabling clearer guidance and faster plan convergence. The work offers practical design implications for integrating verification and multi-dimensional user control into LLM systems, and demonstrates how external verification can substantially increase reliability and user trust in everyday planning tasks.

Abstract

Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features -- a rule translator, flexibility sliders, and a model checker -- that engage users in the verification process. Through a user study (n=12), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

Paper Structure

This paper contains 64 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: VeriPlan Interface --- The front-end interface of VeriPlan. We outline the user's interaction with the front-end as a guide to explain the pipeline of VeriPlan in Section \ref{['sec:tech']}.
  • Figure 2: VeriPlan LLM Planner --- Pipeline of the LLM planner described in Section \ref{['sec:llmPlanner']}. When the user submits an initial prompt in natural language, an LLM agent generates a plan based on the user's input. This plan is later to be verified by the model checker. Simultaneously, the user's prompt is sent to the rule translator to initiate the verification process.
  • Figure 3: VeriPlan Rule Translator --- Pipeline of the rule translator described in Section \ref{['sec:ruleTranslator']}. The translator extracts a set of constraints from the user's initial natural language input that must be adhered to for a correct plan. These constraints are mapped to appropriate LTL properties within the temporal constraint template (Table \ref{['fig:template']}) for model checking. Using this template, the constraints are converted into LTL and PRISM language for model checking, and then presented back in natural language for user verification.
  • Figure 4: VeriPlan Model Checker --- Pipeline of the model checker described in Section \ref{['sec:modelChecker']}. The model checker takes in the LLM-generated plan in PRISM language and the set of rules from the previous stage, and then evaluates the plan against these rules. The plan's validity, along with any violated rules, is sent to both the user and the LLM agents to refine future solutions.
  • Figure 5: Quantitative Data from User Study --- Bar graphs on participants' perceived performance of LLM, usefulness, ease of use, and satisfaction scores across different conditions. Horizontal lines indicate significant pairwise comparisons with Dunnett's test ($p < 0.05^{\ast}$, $p < 0.01^{\ast\ast}$, $p < 0.001^{\ast\ast\ast}$). Vertical lines in each bar graph indicate standard error.