Table of Contents
Fetching ...

The Proof is in the Almond Cookies

Remi van Trijp, Katrien Beuls, Paul Van Eecke

TL;DR

The paper tackles the problem of enabling robots to understand and execute everyday cooking instructions in dynamic kitchens. It introduces a narrative-based, grounded language framework that treats recipes as narratives (fabula, plot, narration) and links language to executable robot plans via Fluid Construction Grammar and a personal dynamic memory, augmented by mental simulation. A almond crescent cookies case study demonstrates how the approach handles challenges like zero anaphora, planning under uncertainty, and step-by-step execution, supported by a kitchen simulator and an integrative narrative network for self-assessment. It also releases a recipe execution benchmark with a language-agnostic representation, facilitating cross-system evaluation and encouraging learning-based grammar induction to scale the method, with potential societal benefits for human-centric AI in both domestic and professional kitchens.

Abstract

This paper presents a case study on how to process cooking recipes (and more generally, how-to instructions) in a way that makes it possible for a robot or artificial cooking assistant to support human chefs in the kitchen. Such AI assistants would be of great benefit to society, as they can help to sustain the autonomy of aging adults or people with a physical impairment, or they may reduce the stress in a professional kitchen. We propose a novel approach to computational recipe understanding that mimics the human sense-making process, which is narrative-based. Using an English recipe for almond crescent cookies as illustration, we show how recipes can be modelled as rich narrative structures by integrating various knowledge sources such as language processing, ontologies, and mental simulation. We show how such narrative structures can be used for (a) dealing with the challenges of recipe language, such as zero anaphora, (b) optimizing a robot's planning process, (c) measuring how well an AI system understands its current tasks, and (d) allowing recipe annotations to become language-independent.

The Proof is in the Almond Cookies

TL;DR

The paper tackles the problem of enabling robots to understand and execute everyday cooking instructions in dynamic kitchens. It introduces a narrative-based, grounded language framework that treats recipes as narratives (fabula, plot, narration) and links language to executable robot plans via Fluid Construction Grammar and a personal dynamic memory, augmented by mental simulation. A almond crescent cookies case study demonstrates how the approach handles challenges like zero anaphora, planning under uncertainty, and step-by-step execution, supported by a kitchen simulator and an integrative narrative network for self-assessment. It also releases a recipe execution benchmark with a language-agnostic representation, facilitating cross-system evaluation and encouraging learning-based grammar induction to scale the method, with potential societal benefits for human-centric AI in both domestic and professional kitchens.

Abstract

This paper presents a case study on how to process cooking recipes (and more generally, how-to instructions) in a way that makes it possible for a robot or artificial cooking assistant to support human chefs in the kitchen. Such AI assistants would be of great benefit to society, as they can help to sustain the autonomy of aging adults or people with a physical impairment, or they may reduce the stress in a professional kitchen. We propose a novel approach to computational recipe understanding that mimics the human sense-making process, which is narrative-based. Using an English recipe for almond crescent cookies as illustration, we show how recipes can be modelled as rich narrative structures by integrating various knowledge sources such as language processing, ontologies, and mental simulation. We show how such narrative structures can be used for (a) dealing with the challenges of recipe language, such as zero anaphora, (b) optimizing a robot's planning process, (c) measuring how well an AI system understands its current tasks, and (d) allowing recipe annotations to become language-independent.
Paper Structure (14 sections, 7 figures)

This paper contains 14 sections, 7 figures.

Figures (7)

  • Figure 1: This Figure shows an English recipe for almond crescent cookies, adapted from https://www.simplyrecipes.com/recipes/almond_crescent_cookies/.
  • Figure 2: A narrative is a three-layered structure consisting of a fabula, plot, and narration. Narrative-based understanding involves constructing the plot using the narration and the fabula, thereby integrating language processing, memory, mental simulation, perception, and so on.
  • Figure 3: This figure illustrates a single cycle in the construction of the recipe's plot. On the top left: while parsing, the language processor has access to the cooking agent's grammar, ontology, and the entities that are currently under its attention (accessible-entities$_i$). Comprehension results in a partial executable robot plan (here the operation portion-and-arrange). Through interaction with a kitchen simulator and the agent's personal dynamic memory, a complete plan is generated and executed, leading to a new plot beat (kitchen-state$_j$), which includes new accessible entities (tablespoons of dough). The cycle can then repeat itself with the next instruction until the recipe is finished.
  • Figure 4: A transient structure contains information about both the input sentence and the entities that are accessible from discourse context.
  • Figure 5: Parsing "116 grams sugar" leads to a partial robot plan that the cooking agent needs to complete into an executable one.
  • ...and 2 more figures