Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models
Angelos Mavrogiannis, Christoforos Mavrogiannis, Yiannis Aloimonos
TL;DR
Cook2LTL presents a framework to translate free-form cooking recipes into robot-executable temporal logic by grounding high-level actions to a primitive action set and caching deconstructed actions in a dynamic library. The approach leverages a semantic parser trained on Recipe1M+ data and LLM-based action reduction to map actions to primitives, then translates sequences into LTL formulae such as $F(\psi_1 \wedge F(\psi_2 \wedge \dots))$ to capture temporal order. An ablation study demonstrates substantial reductions in API calls, latency, and cost when using the action library, while maintaining high executability across recipes. Demonstrations in AI2-THOR confirm the method's potential for sim-to-real transfer, though the results highlight sensitivity to initial LLM outputs and the need for robust execution feedback and more extensive datasets. The work advances practical, temporally-aware task planning for cooking in robotics by marrying semantic parsing, few-shot prompting, and formal temporal logic grounding.
Abstract
Cooking recipes are challenging to translate to robot plans as they feature rich linguistic complexity, temporally-extended interconnected tasks, and an almost infinite space of possible actions. Our key insight is that combining a source of cooking domain knowledge with a formalism that captures the temporal richness of cooking recipes could enable the extraction of unambiguous, robot-executable plans. In this work, we use Linear Temporal Logic (LTL) as a formal language expressive enough to model the temporal nature of cooking recipes. Leveraging a pretrained Large Language Model (LLM), we present Cook2LTL, a system that translates instruction steps from an arbitrary cooking recipe found on the internet to a set of LTL formulae, grounding high-level cooking actions to a set of primitive actions that are executable by a manipulator in a kitchen environment. Cook2LTL makes use of a caching scheme that dynamically builds a queryable action library at runtime. We instantiate Cook2LTL in a realistic simulation environment (AI2-THOR), and evaluate its performance across a series of cooking recipes. We demonstrate that our system significantly decreases LLM API calls (-51%), latency (-59%), and cost (-42%) compared to a baseline that queries the LLM for every newly encountered action at runtime.
