Table of Contents
Fetching ...

Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL

Naoaki Kanazawa, Kento Kawaharazuka, Yoshiki Obinata, Kei Okada, Masayuki Inaba

TL;DR

A robot system that integrates real-world executable robot cooking behavior planning using the Large Language Model (LLM) and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision Language model (VLM).

Abstract

Although there is a growing demand for cooking behaviours as one of the expected tasks for robots, a series of cooking behaviours based on new recipe descriptions by robots in the real world has not yet been realised. In this study, we propose a robot system that integrates real-world executable robot cooking behaviour planning using the Large Language Model (LLM) and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision-Language model (VLM). We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment, and confirmed the effectiveness of the proposed system.

Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL

TL;DR

A robot system that integrates real-world executable robot cooking behavior planning using the Large Language Model (LLM) and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision Language model (VLM).

Abstract

Although there is a growing demand for cooking behaviours as one of the expected tasks for robots, a series of cooking behaviours based on new recipe descriptions by robots in the real world has not yet been realised. In this study, we propose a robot system that integrates real-world executable robot cooking behaviour planning using the Large Language Model (LLM) and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision-Language model (VLM). We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment, and confirmed the effectiveness of the proposed system.
Paper Structure (13 sections, 23 figures, 2 tables)

This paper contains 13 sections, 23 figures, 2 tables.

Figures (23)

  • Figure 1: Real-world cooking robot system considering food state changes from recipe descriptions using foundation models and classical planning PDDL. The input recipe description is converted into the function sequence by the Large Language Model (LLM), and the executable action procedure is planned by classical planning of PDDL description from the sequence. The robot performs cooking actions while recognizing the state change of the ingredients by food state recognition learning from small data using the Vision-Language Model (VLM). Motion execution is performed using predefined action trajectories.
  • Figure 2: Known recipes covered in this study. Natural language description of the steps of three recipes for sunny-side up, poached egg, and scrambled egg, and human annotation of the cooking function sequences.
  • Figure 3: Unknown recipes covered in this study. Natural language description of two new recipes for "Butter arranged sunny-side up" and "Boiled and sauteed broccoli".
  • Figure 4: Problem setting for the real-world robot cooking from recipes in this study. The kitchen environment and the robot used in this study. In the figure of the kitchen environment, spot that represent locations in the PDDL description are shown in blue, and objects such as tool and vessel are shown in orange. As shown in the figure, we consider the problem setting in which PR2, a dual-armed wheeled robot, cooks in a kitchen environment with an IH stove and a water tap.
  • Figure 5: Real-world executable robotic cooking action planning from recipe. First, the input recipe description in natural language is converted into the cooking function sequence that can be interpreted by the robot using the few-shot prompting of Large Language Model(LLM). The black text in the figure shows the actual prompts, and its last recipe section depends on the natural language description of the recipe to be converted. The blue part is the result of the conversion that the LLM outputs. Next, rule-based processing transforms the cooking function sequence into corresponding target conditions for each step within the PDDL aeronautiques1998pddl description. Finally, classical symbolic planning using the PDDL description is used to plan the complementary action steps so that they can be executed in the real environment.
  • ...and 18 more figures